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Preface 


ASIACRYPT 2009, the 15th International Conference on the Theory and Appli- 
cation of Cryptology and Information Security was held in Tokyo, Japan, during 
December 6-10, 2009. The conference was sponsored by the International As- 
sociation for Cryptologic Research (IACR) in cooperation with the Technical 
Group on Information Security (ISEC) of the Institute of Electronics, Informa- 
tion and Communication Engineers (IEICE). ASIACRYPT 2009 was chaired by 
Eiji Okamoto and I had the honor of serving as the Program Chair. 

The conference received 300 submissions from which two papers were with- 
drawn. Each paper was assigned at least three reviewers, and papers co-authored 
by Program Committee members were assigned at least five reviewers. We spent 
eight weeks for the review process, which consisted of two stages. In the first four- 
week stage, each Program Committee member individually read and evaluated 
assigned papers (individual review phase), and in the second four-week stage, 
the papers were scrutinized with an extensive discussion (discussion phase). The 
review reports and discussion comments reached a total of 50,000 lines. 

Finally, the Program Committee decided to accepted 42 submissions, of which 
two submissions were merged into one paper. As a result, 41 presentations were 
given at the conference. The authors of the accepted papers had four weeks to 
prepare final versions for these proceedings. These revised papers were not sub- 
ject to editorial review and the authors bear full responsibility for their contents. 
Unfortunately there were a number of good papers that could not be included 
in the program due to this year’s tough competition. 

Tatsuaki Okamoto delivered the 2009 IACR Distinguished Lecture. The Pro- 
gram Committee decided to give the Best Paper Award of ASIACRYPT 2009 to 
the following paper: “Improved Generic Algorithms for 3-Collisions” by Antoine 
Joux and Stefan Lucks. They received an invitation to submit a full version to 
the Journal of Cryptology. In addition to the papers included in this volume, 
the conference also featured a rump session, a forum for short and entertaining 
presentations on recent works of both a technical and non-technical nature. 

There are many people who contributed to the success of ASIACRYPT 2009. 
First I would like to thank all authors for submitting their papers to the con- 
ference. I am deeply grateful to the Program Committee for giving their time, 
expertise and enthusiasm in order to ensure that each paper received a thorough 
and fair review. Thanks also to 303 external reviewers, listed on the following 
pages, for contributing their time and expertise. Finally, I would like to thank 
Shai Halevi for maintaining his excellent Web Submission and Review Software. 
Without this system, which covers all processes from paper submission to prepa- 
ration of the proceedings, I could not have handled 300 papers so smoothly. 
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Related-Key Cryptanalysis of the Full AES-192 
and AES-256 


Alex Biryukov and Dmitry Khovratovich 


University of Luxembourg 


Abstract. In this paper we present two related-key attacks on the full 
AES. For AES-256 we show the first key recovery attack that works for 
all the keys and has 2°°° time and data complexity, while the recent 
attack by Biryukov-Khovratovich-Nikolié works for a weak key class and 
has much higher complexity. The second attack is the first cryptanalysis 
of the full AES-192. Both our attacks are boomerang attacks, which are 
based on the recent idea of finding local collisions in block ciphers and 
enhanced with the boomerang switching techniques to gain free rounds 
in the middle. 


Keywords: AES, related-key attack, boomerang attack. 


The extended version of this paper is available at 
http://eprint.iacr.org/2009/317.pdf 


1 Introduction 


The Advanced Encryption Standard (AES) [9] — a 128-bit block cipher, is one 
of the most popular ciphers in the world and is widely used for both commercial 
and government purposes. It has three variants which offer different security 
levels based on the length of the secret key: 128, 192, 256-bits. Since it became 
a standard in 2001 [[], the progress in its cryptanalysis has been very slow. The 
best results until 2009 were attacks on 7-round AES-128 [0M], 10-round AES- 
192 B3], 10-round AES-256 out of 10, 12 and 14 rounds respectively. 
The two last results are in the related-key scenario. 

Only recently there was announced a first attack on the full AES-256 [6]. The 
authors showed a related-key attack which works with complexity 2°° for one 
out of every 22° keys. They have also shown practical attacks on AES-256 (see 
also [7]) in the chosen key scenario, which demonstrates that AES-256 can not 
serve as a replacement for an ideal cipher in theoretically sound constructions 
such as Davies-Meyer mode. 

In this paper we improve these results and present the first related-key attack 
on AES-256 that works for all the keys and has a better complexity (299° data 
and time). We also develop the first related key attack on the full AES-192. 
In both attacks we minimize the number of active S-boxes in the key-schedule 
(which caused the previous attack on AES-256 to work only for a fraction of all 
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Table 1. Best attacks on AES-192 and AES-256 


Partial sums 8 M e a ote ? mo 
Related-key rectangle 
amplified boomerang 


256 


Partial sums 9 256 oe m 
Related-key rectangle 10 64 oi ? EL] 
Related-key differential 14 936 gist ‘ 


keys) by using a boomerang attack enhanced with boomerang switching tech- 
niques. We find our boomerang differentials by searching for local collisions 
in a cipher. The complexities of our attacks and a comparison with the best 
previous attacks are given in Table [J 

This paper is structured as follows: In Section B] we develop the idea of local 
collisions in the cipher and show how to construct optimal related-key differen- 
tials for AES-192 and AES-256 . In Section H] we briefly explain the idea of a 
boomerang and an amplified boomerang attack. In Sections Bland BJ we describe 
an attack on AES-256 and AES-192, respectively. 


2 AES Description and Notation 


We expect that most of our readers are familiar with the description of AES and 
thus point out only the main features of AES-256 that are crucial for our attack. 

AES rounds are numbered from 1 to 14 (12 for AES-192). We denote the i-th 
192-bit subkey (do not confuse with a 128-bit round key) by KŻ, i.e. the first 
(whitening) subkey is the first four columns of K°. The last subkey is K7 in 
AES-256 and K® in AES-192. The difference in KŻ is denoted by AK’. Bytes 
of a subkey are denoted by ki j» Where 7,7 stand for the row and column index, 
respectively, in the standard matrix representation of AES, and / stands for the 
number of the subkey. Bytes of the plaintext are denoted by p; j, and bytes of the 
internal state after the SubBytes transformation in round r are denoted by aj ,, 
with A” depicting the whole state. Let us also denote by b;; byte in position 
(i, j) after the r-th application of MixColumns. 


Features of AES-256. AES-256 has 14 rounds and a 256-bit key, which is two 
times larger than the internal state. Thus the key schedule consists of only 7 
rounds. One key schedule round consists of the following transformations: 


Related-Key Cryptanalysis of the Full AES-192 and AES-256 3 


kio = S(Ri 41,7) @ ki o ec’, 0<i<3; 


O<i<3,1<j5 <3; 
0<i<3; 


UStss, 55957, 


where S() stands for the S-box, and C! — for the round-dependent constant. 
Therefore, each round has 8 S-boxes. 


Features of AES-192. AES-192 has 12 rounds and a 192-bit key, which is 1.5 
times larger than the internal state. Thus the key schedule consists of 8 rounds. 
One key schedule round consists of the following transformations: 


kit = S(kis15) © kio act, 0<7i<3; 
I+ l l . . 
kii — kij a ® kij» 0<i<3,1<j<5 


Notice that each round has only four S-boxes. 


3 Local Collisions in AES 


The notion of a local collision comes from 
the cryptanalysis of hash functions with 
one of the first applications by Chabaud 
and Joux [B]. The idea is to inject a dif- 
ference into the internal state, causing a 


Key schedule round 


i : 3 SubBytes 
disturbance, and then to correct it with 
the next injections. The resulting differ- 
ence pattern is spread out due to the mes- ShiftRows 


sage schedule causing more disturbances 
in other rounds. The goal is to have as 
few disturbances as possible in order to 
reduce the complexity of the attack. 

In the related-key scenario we are al- 
lowed to have difference in the key, and 
not only in the plaintext as in the pure 
differential cryptanalysis. However the 
attacker can not control the key itself and 
thus the attack should work for any key 
pair with a given difference. 


MixColumns 


H 


Fig. 1. A local collision in AES-256 


correction 


H 


Key schedule round 


Local collisions in AES-256 are best understood on a one-round example (Fig.[]), 
which has one active S-box in the internal state, and five non-zero byte differences 
in the two consecutive subkeys. This differential holds with probability 27° if we 
use an optimal differential for an S-box: 
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Oxlf Ox3e 

SubBytes i 0 MixColumns Oxlf 

0x01 Ox1f,; 0 => Oxf 
0 0x21 


Due to the key schedule the differences spread to other subkeys thus forming 
the key schedule difference. The resulting key schedule difference can be viewed 
as a set of local collisions, where the expansion of the disturbance (also called 
disturbance vector) and the correction differences compensate each other. The 
probability of the full differential trail is then determined by the number of 
active S-boxes in the key-schedule and in the internal state. The latter is just 
the number of the non-zero bytes in the disturbance vector. 

Therefore, to construct an optimal trail we have to construct a minimal-weight 
disturbance expansion, which will become a part of the full key schedule differ- 
ence. For the AES key schedule, which is mostly linear, this task can be viewed 
as building a low-weight codeword of a linear code. Simultaneously, correction 
differences also form a codeword, and the key schedule difference codeword is 
the sum of the disturbance and the correction codewords. In the simplest trail 
the correction codeword is constructed from the former one by just shifting four 
columns to the right and applying the S-box—MixColumns transformation. 

An example of a good key-schedule pattern for AES-256 is depicted in Figure] 
as a 4.5-round codeword. In the first four key-schedule rounds the disturbance 
codeword has only 9 active bytes (red cells in the picture), which is the lower 
bound. We want to avoid active S-boxes in the key schedule as long as possible, 
so we start with a single-byte difference in byte kj and go backwards. Due to 
a slow diffusion in the AES key schedule the difference affects only one more 
byte per key schedule round. The correction (grey) column should be positioned 
four columns to the right, and propagates backwards in the same way. The last 
column in the first subkey is active, so all S-boxes of the first round are active 
as well, which causes an unknown difference in the first (green) column. This 
“alien” difference should be canceled by the plaintext difference. 


Disturbance 


pastiche 


Correction 


fa 
fa Geet feet NSN te 


Fig. 2. Full key schedule difference (4.5 key-schedule rounds) for AES-256 
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4 Related Key Boomerang and Amplified Boomerang 
Attacks 


In this section we describe two types of boomerang attacks in the related-key 
scenario. 

A basic boomerang distinguisher is applied to a cipher Ex (-) which is 
considered as a composition of two sub-ciphers: Ex (-) = E1 o Eo. The first sub- 
cipher is supposed to have a differential a — (, and the second one to have a 
differential y — 6, with probabilities p and q, respectively. In the further text 
the differential trails of Ey and E; are called upper and lower trails, respectively. 

In the boomerang attack a plaintext pair results in a quartet with probability 
p*q°. The amplified boomerang attack [TJ] (also called rectangle attack K) works 
in a chosen-plaintext scenario and constructs N?p7q?2—” quartets of N plaintext 
pairs. We refer to for the full description of the attacks. 

In the original boomerang attack paper by Wagner it was noted that 
the number of good ciphertext quartets is actually higher, since an attacker may 
consider other 3 and y (with the same a and ô). This observation can be applied 
to both types of boomerang attacks. As a result, the number Q of good quartets 
is expressed via amplified probabilities p and ĝ as follows: 


qap, 


p= X Pl -> p; â= |X Ph > dP. (1) 
B rf 


4.1 Related-Key Attack Model 


where 


The related-key attack model [3] is a class of cryptanalytic attacks in which the 
attacker knows or chooses a relation between several keys and is given access to 
encryption/decryption functions with all these keys. The goal of the attacker is 
to find the actual secret keys. The relation between the keys can be an arbitrary 
bijective function R (or even a family of such functions) chosen in advance by 
the attacker (for a formal treatment of the general related key model see BMA). 
In the simplest form of this attack, this relation is just a XOR with a constant: 
Kə = Kı ®C, where the constant C is chosen by the attacker. This type of 
relation allows the attacker to trace the propagation of XOR differences induced 
by the key difference C through the key schedule of the cipher. However, more 
complex forms of this attack allow other (possibly non-linear) relations between 
the keys. For example, in some of the attacks described in this paper the attacker 
chooses a desired XOR relation in the second subkey, and then defines the implied 
relation between the actual keys as: Ky = F~+(F(Ki) 6 C) = Rco(K1) where 
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F represents a single round of the AES-256 key schedule, and the constant C is 
chosen by the attacker[] 

Compared to other cryptanalytic attacks in which the attacker can manipulate 
only the plaintexts and/or the ciphertexts the choice of the relation between 
secret keys gives additional degree of freedom to the attacker. The downside of 
this freedom is that such attacks might be harder to mount in practice. Still, 
designers usually try to build “ideal” primitives which can be automatically used 
without further analysis in the widest possible set of applications, protocols, or 
modes of operation. Thus resistance to such attacks is an important design goal 
for block ciphers, and in fact it was one of the stated design goals of the Rijndael 
algorithm, which was selected as the Advanced Encryption Standard. 

In this paper we use boomerang attacks in the related-key scenario. In the 
following sections we denote the difference between subkeys in the upper trail 
by AK’, and in the lower part by VK’. 


4.2 Boomerang Switch 


Here we analyze the transition from the sub-trail Ep to the sub-trail Æ1, which 
we call the boomerang switch. We show that the attacker can gain 1-2 middle 
rounds for free due to a careful choice of the top and bottom differentials. The 
position of the switch is a tradeoff between the sub-trail probabilities, that should 
minimize the overall complexity of the distinguisher. Below we summarize the 
switching techniques that can be used in boomerang or amplified boomerang 
attacks on any block cipher. 


Ladder switch. By default, a cipher is decomposed into rounds. However, such 
decomposition may not be the best for the boomerang attack. We propose 
not only to further decompose the round into simple operations but also to 
exploit the existing parallelism in these operations. For example some bytes 
may be independently processed. In such case we can switch in one byte be- 
fore it is transformed and in another one after it is transformed, sce Fig. B] for 
an illustration. 

An example is our attack on AES-192. Let us look at the differential trails 
(see Fig. B). There is one active S-box in round 7 of the lower trail in byte 
bis: On the other hand, the S-box in the same position is not active in the 
upper trail. If we would switch after ShiftRows in round 6, we would “pay” the 
probability in round 7 afterwards. However, we switch all the state except bo,2 
after MixColumns, and switch the remaining byte after the S-box application in 
round 7, where it is not active. We thus do not pay for this S-box. 


Feistel switch. Surprisingly, a Feistel round with an arbitrary function (e.g., an 
S-box) can be passed for free in the boomerang attack (this was first observed 
in the attack on cipher Khufu in [5]). Suppose the internal state (X, Y) is 


1 Note that due to low nonlinearity of AES-256 key schedule such subkey relation 
corresponds to a fixed XOR relation in 28 out of 32 bytes of the secret key, and a 
simple S-box relation in the four remaining bytes. 
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"e OD | v |__| 
Yory iy 


sauna Ey / E, boundary E 


Fig. 3. The ladder switch in a toy three S-box block. A switch either before or after 
the S-box layer would cost probability, while the ladder does not. 


transformed to (Z = X ® f(Y), Y) at the end of Eo. Suppose also that the Eo 
difference before this transformation is (Ax, Ay), and that the Æ, difference 
after this transformation is (Az, Ay). 

As aresult, variable Y in the four iterations of a boomerang quartet takes two 
values: Yo and Yo ® Ay for some Yo. Then the f transformation is guaranteed 
to have the same output difference Ay in the quartet. Therefore, the decryption 
phase of the boomerang creates the difference Ax in X at the end of Eo “for 
free”. This trick is used in the switch in the subkey in the attack on AES-192. 


S-box switch. This is similar to the Feistel switch, but costs probability only 
in one of the directions. Suppose that Ho ends with an S-box Y = S(X) with 
difference A If the output of an S-box in a cipher has difference A and if the same 
difference A comes from the lower trail, then propagation through this S-box is 
for free on one of the faces of the boomerang. Moreover, the other direction can 
use amplified probability since specific value of the difference A is not important 
for the wita 


5 Attack on AES-256 


In this section we present a related key boomerang attack on AES-256. 


5.1 The Trail 


The boomerang trail is depicted in Figure [JJ and the actual values are listed in 
Tables Bland P] It consists of two similar 7-round trails: the first one covers rounds 
2 This type of switch was used in the original version of this paper, but is not needed 


now due to change in the trails. We describe it here for completeness, since it might 
be useful in other attacks. 
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Table 2. Key schedule difference in the AES-256 trail 


? 00 00 00 3e 3e 3e 3e 
2010101 ? 21 21 21 
2.00 00 00 1f 1f 1f 1f 
? 00 00 00 1f 1f 1f 1f 
00 00 00 00 3e 00 00 00 
00 01 00 00 21 00 00 00 
00 00 00 00 1f 00 00 00 
00 00 00 00 1f 00 00 00 


ie a a, 
XXX Kip if if 00 
22? Pap ip if 00 
? ? ? ? 2121 2100 
? 010101 3e 3e 3e 3e 
X 00 00 00 1f 1f 1f 1f 
? 0000 00 1f 1f 1f 1f 
? 00 00 00 21 21 21 21 
01 00 00 00 3e 00 00 00 
00 00 00 00 1f 00 00 00 
00 00 00 00 1f 00 00 00 
00 00 00 00 21 00 00 00 


00 00 00 00 3e 00 3e 00 
00 01 00 01 21 00 21 00 
00 00 00 00 1f 00 1f 00 
00 00 00 00 1f 00 1f 00 
00 00 00 00 3e Be 3e 3e 
000101017 ? ? ? 
00 00 00 00 1f 1f 1f 1f 
00 00 00 00 1f 1f 1f 1f 


? 01? 00? ? 0000 
X 00 X 00 1f 1f 00 00 
? 00 ? 00 1f 1f 00 00 
? 00 ? 00 21 21 00 00 
01 00 01 00 3e 00 3e 00 
00 00 00 00 1f 00 1f 00 
00 00 00 00 1f 00 1f 00 
00 00 00 00 21 00 21 00 
01010101? ? 

00 00 00 00 1f 1f 1f 1f 
00 00 00 00 1f 1f 1f 1f 
00 00 00 00 21 21 21 21 


00 00 00 00 3e 3e 00 00 
00 01 01 00 21 21 00 00 
00 00 00 00 1f 1f 00 00 
00 00 00 00 1f 1f 00 00 


? ? 0000 ? 00 00 00 
X X 0000 1f 00 00 00 
? ? 00 00 1f 00 00 00 
? ? 00 00 21 00 00 00 
01 01 00 00 3e 3e 00 00 
00 00 00 00 1f 1f 00 00 
00 00 00 00 1f 1f 00 00 
00 00 00 00 21 21 00 00 


1-8, and the second one covers rounds 8-14. The trails differ in the position of 
the disturbance bytes: the row 1 in the upper trail, and the row 0 in the lower 
trail. This fact allows the Ladder switch. 

The switching state is the state A® (internal state after the SubBytes in round 
9) and a special key state Ks, which is the concatenation of the last four columns 
of K? and the first four columns of K4. Although there are active S-boxes in 
the first round of the key schedule, we do not impose conditions on them. As a 
result, the difference in column 0 of K? is unknown yet. 


Related Keys. We define the relation between four keys as follows (see also 
Figure A). For a secret key K4, which the attacker tries to find, compute its 
second subkey AK}, and apply the difference AK! to get a subkey Kp, from 
which the key Kp is computed. The relation between K4 and Kp is a constant 
XOR relation in 28 bytes out of 32 and is computed via a function ki, = 
ki,o ($>) S(ki+1,7) ($>) S(ki+1,7 (S> Ci+1,7), i=0,1,2,3, with constant Ci+1,7 = Ak? i7 
for the four remaining bytes. 

The switch into the keys Kc, Kp happens between the 3rd and the 4th sub- 
keys in order to avoid active S-boxes in the key-schedule using the Ladder switch 
idea described above. We compute subkeys K? and K 4 for both K4 and Kp. 
We add the difference VK? to KÌ and compute the upper half (four columns) 
of KÈ. Then we add the difference VK* to K4 and compute the lower half (four 
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Table 3. Non-zero internal state differences in the AES-256 trail 


? 00 00 00 ? 00 00 00 00 00 00 00 00 00 00 00 

EET If ? 1f 1f |443 O01f001f| 445 00 Lf 1f 00 

? 00 ? 00 00 00 ? 00 00 00 00 00 00 00 00 00 

? 0000 ? 00 00 00 ? 00 00 00 00 00 00 00 00 

00 00 00 00 If if lf if If 00 1f 00 If 1f 00 00 

7 00 Lf 00 00 00 00 00 00}. 49 00 00 00 00| 411 00 00 00 00 

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
If 00 00 00 00 00 00 00 
3 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 


A 


VA 


K” K+ 
oe i 
| l 
VK? — 0 e-~-— VKA 
-0 $ 
Ka K K | 4 4 I 
l } | 
AK! —@ K? K? gi Kë 


Fig. 4. AES-256: Computing Kg, Kc, and Kp from Ka 


columns) of Ké. From these eight consecutive columns we compute the full Ko. 
The key Kp is computed from Kg in the same way. 

Finally, we point out that difference between Kc and Kp can be computed in 
the backward direction deterministically since there would be no active S-boxes 
till the first round. The secret key K4, and the three keys Kp, Kc, Kp computed 
from K4 as described above form a proper related key quartet. Moreover, due 
to a slow diffusion in the backward direction, as a bonus we can compute some 
values in VK* even for i = 0,1,2,3 (see Table B). Hence given the byte value 
ki for K4 we can partly compute Kg, Kc and Kp. 


Internal State. The plaintext difference is specified in 9 bytes. We require that 
all the active S-boxes in the internal state should output the difference 0x1f so 
that the active S-boxes are passed with probability 276. The only exception is 
the first round where the input difference in nine active bytes is not specified. 
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Let us start a boomerang attack with a random pair of plaintexts that fit the 
trail after one round. Active S-boxes in rounds 3-7 are passed with probability 
2-6 each, so the overall probability is 270. 

We switch the internal state in round 9 with the Ladder switch technique: 
the row 1 is switched before the application of S-boxes, and the other rows are 
switched after the S-box layer. As a result, we do not pay for active S-boxes at 
all in this round. 

The second part of the boomerang trail is quite simple. Three S-boxes in rounds 
10-14 contribute to the probability, which is thus equal to 2718. Finally we get one 
boomerang quartet after the first round with probability 2~90- 90-18-18 — 2-96, 


5.2 The Attack 
The attack works as follows. Do the following steps 2°55 times: 


1. Prepare a structure of plaintexts as specified below. 

2. Encrypt it on keys Ky and Kg and keep the resulting sets S4 and Spg in 
memory. 

3. XOR Ac to all the ciphertexts in S4 and decrypt the resulting ciphertexts 
with Kc. Denote the new set of plaintexts by Sc. 

4. Repeat previous step for the set Sg and the key Kp. Denote the set of 
plaintexts by Sp. 


5. Compose from Sc and Sp all the possible pairs of plaintexts which are equal 
cfe] 
[c| 


in 56 bits 

6. For every remaining pair check if the difference in p;o, i > 1 is equal on both 
sides of the boomerang quartet (16-bit filter). Note that Vk? = 0 so Ak?) 
should be equal for both key pairs (K4, Kg) and (Kc, Kp). 

7. Filter out the quartets whose difference can not be produced by active S- 
boxes in the first round (one-bit filter per S-box per key pair) and active 
S-boxes in the key schedule (one-bit filter per S-box), which is a 2-2+2 = 6- 
bit filter. 

8. Gradually recover key values and differences simultaneously filtering out the 
wrong quartets. 


Each structure has all possible values in column 0 and row 0, and constant values 
in the other bytes. Of 2”? texts per structure we can compose 2144 ordered 
pairs. Of these pairs 2144-89 = 27? pass the first round. Thus we expect one 
right quartet per 298772 = 274 structures, and three right quartets out of 225-5 
structures. 

Let us now compute the number of noisy quartets. About 
272 pairs come out of step E] The next step applies a 6-bit filter, so we get 
g72+25.5—6 — 991-5 candidate quartets in total. 

The remainder of this section deals with gradual recovering of the key and 
filtering wrong quartets. The key bytes are recovered as shown in Figure B] 


9144-56-16 _ 
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Fig. 5. Gradual key recovery. Digits stand for the steps, 7D’ means difference. 


1. First, consider 4-tuples of related key bytes in each position (1, j), j < 4. Two 
differences in a tuple are known by default. The third difference is unknown 
but is equal for all tuples (see Table B] where it is denoted by X) and gets 
one of 2” values. We use this fact for key derivation and filtering as follows. 
Consider key bytes k$ > and k$ 3. The candidate quartet proposes 2? candi- 
dates for both 4-tuples of related-key bytes, or 24 candidates in total. Since 
the differences are related with the X-difference, which is a 9-bit filter, this 
step reveals two key bytes and the value of X and reduces the number of 
quartets to 2915-5 = 286.5, 

2. Now consider the value of Ak? ,, which is unknown yet and might be different 
in two pairs of related keys. Let us notice that it is determined by the value of 
k$ 7, and Vk3 7 = 0, so that Ak? o is the same for both related key pairs and 
can take 27 values. Each guess ‘of Ak? o proposes key candidates for byte 
k9 o, where we have a 8-bit filter for the 4-tuple of related-key bytes. We 
thus derive the value of kf 9 in all keys and reduce the number of candidate 
quartets to 285:5, 

3. The same trick holds for the unknown Ak? 4, which can get 27 possible values 
and can be computed for both key pairs simultaneously. Each of these values 
proposes four candidates for k? ,, which are filtered with an 8-bit filter. We 
thus recover k?, and Ak? , and reduce the number of quartets to 27°. 

4. Finally, we notice that Ak? 4 is completely determined by ee o ke nk 25 k? 3 
and k$ 7. There are at most two candidates for the latter value as well as for 
Ak? 4, so we get a 6-bit filter and reduce the number of quartets to 27°. 

5. Each quartet also proposes two candidates for each of key bytes k8 9, k89, 
and k$ 3- Totally, the number of key candidates proposed by each quartet 
is 2°. 


The key candidates are proposed for 11 bytes of each of four related keys. How- 
ever, these bytes are strongly related so the number of independent key bytes on 
which the voting is performed is significantly smaller than 11 x 4. At least, bytes 
Koo, Kia, kag and k$, of K4 and Kc are independent so we recover 15 key 
bytes with 278-5 proposals. The probability that three wrong quartets propose 
the same candidates does not exceed 278°, 

We thus estimate the complexity of the filtering step as 277-5 time and memory. 
We recover 3-7+8-8 = 85 bits of of K4 (and 85 bits of Kc) with 299° data 
and time and 277-> memory. 

The remaining part of the key can be found with many approaches. One is 
to relax the condition on one of the active S-boxes in round 3 thus getting four 
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more active S-boxes in round 2, which in turn leads to a full-difference state 
in round 1. The condition can be actually relaxed only for the first part of the 
boomerang (the key pair (K4, Kp)) thus giving a better output filter. For each 
candidate quartet we use the key bytes, that were recovered at the previous 
step, to compute AA! and thus significantly reduce the number of keys that are 
proposed by a quartet. We then rank candidates for the first four columns of 
K? and take the candidate that gets the maximal number of votes. Since we 
do not make key guesses, we expect that the complexity of this step is smaller 
than the complexity of the previous step (299°). The right quartet also provide 
information about four more bytes in the right half of K9 that correspond to 
the four active S-boxes in round 2. The remaining 8 bytes of K4 can be found 
by exhaustive search. 


6 Attack on AES-192 


The key schedule of AES-192 has better diffusion, so it is hard to avoid active S- 
boxes in the subkeys. We construct a related-key boomerang attack with two sub- 
trails of 6 rounds each. The attack is an amplified-boomerang attack because we 
have to deal with truncated differences in both the plaintext and the ciphertext, 
the latter would be expensive to handle in a plain boomerang attack. 


6.1 The Trail 


The trail is depicted in Figure B] and the actual values are listed in Tables W 
and] The key schedule codeword is depicted in Figure [E] 


Table 4. Internal state difference in the AES-192 trail 


? ? 3e ? If ? 001f 00 1f 00 00 00 17 1f 00 
if lf ? 1f 00 00 ? 00 00 00 00 00 | , ,3 00 00 00 00 
1fifif ? 00 00 00 ? 00 00 00 00 00 00 00 00 
? 21 21 21 ? 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 If 00 00 00 00 00 17 If If 00 00 00 IF 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 


if if 1f if 00 00 1f 00 If 00 00 00 If 1f 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 | „y 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 


00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 1f 00 00 00 00 00 ?? 2 ? TR WR 
00 00 00 00 00 00 00 00 00 00 00 00 1f if 1f 1f 
00 00 00 00 00 00 00 00 00 00 00 00 1f if 1f if 
00 00 00 00 00 00 00 00 00 00 00 00 ? 2? 2? R 
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Table 5. Key schedule difference in the AES-192 trail 


00 3e 3e 3f 3e OL 
00 1f 1f 1f 1f 00 
00 1f 1f 1f 1f 00 
? 21 21 21 21 00 
00 3e 00 01 01 O1 
00 1f 00 00 00 00 
00 1f 00 00 00 00 
00 21 00 00 00 00 
27? 3e 3f 3e 

2?21f if if 

2?21f lf if 

972 ? 2121 

3e 00 01013701 
1f 00 00 00 1f 00 
1f 00 00 00 1f 00 
? 00 00 00 21 00 
3e 3e 01 00 00 00 
1f 1f 00 00 00 00 
1f 1f 00 00 00 00 
21 21 00 00 00 00 


00 3e 00 3f 01 00 
00 1f 00 1f 00 00 
00 1f 00 1f 00 00 
00 21 00 21 00 00 
00 3e 3e 3f 3e 3f 
OO1f if if 1f if 
OO1f if if 1f if 
To r 2 2 2 2 
? ? 3f 01 3e 00 
?? 1f 00 1f 00 
?? 1f 00 1f 00 
?? ? 002100 
3e 3e 3f 3e 01 00 
1f 1f 1f 1f 00 00 
1f 1f 1f 1f 00 00 
21 21 21 21 00 00 
3e 00 01 01 01 01 
1f 00 00 00 00 00 
1f 00 00 00 00 00 
21 00 00 00 00 00 


00 3e 3e 01 00 00 
00 1f 1f 00 00 00 
00 1f 1f 00 00 00 
00 21 21 00 00 00 


? 3e 01 00 3e 3e 
? 1f 00 00 1f 1f 
? 1f 00 00 1f 1f 
? ? 0000 21 21 
3e 00 3f 01 00 00 


5 lf 00 1f 00 00 00 


1f 00 1f 00 00 00 
21 00 21 00 00 00 
3e 3e 3f 3e 3f 3e 
Ip 1f 1f 1f ig iy 
af af 1f 1f 1f 1f 


PTP??? 
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Related Keys. We define the relation between four keys similarly to the attack 
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Fig. 6. AES-192 key schedule codeword 


13 


on AES-256. Assume we are given a key K4, which the attacker tries to find. 
We compute its subkey K} and apply the difference AK! to get the subkey Kp, 


from which the key Kg is computed. Then we compute the subkeys K4 and 
K$ and apply the difference VK“ to them. We get subkeys Ké and K$, from 
which the keys Kc and Kp are computed. 

Now we prove that keys K4, Kg, Kc, and Kp form a quartet, i.e. the subkeys 
of Kc and Kp satisfy the equations KŁ K} = AK',1 = 1,2,3. The only active 
S-box is positioned between K? and K4, whose input is kös. However, this 
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Fig. 7. AES-256 Eo and E; trails. Green ovals show an overlap between the two trails 
where the switch happens. 
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S- ui Sy the same pair of inputs in both key pairs (see the “Feistel switch” 
Sec. EZ). Indeed, if we compute Vk s from AK4, then it is equal to Ak}; = 
0x01. ere if the active S-box gets as input a and a@1 in K4 and Kp, 
respectively, then it gets a ® 1 and a in Kc and Kp, respectively. As a result, 
Ke, © K} = AK®, the further propagation is linear, so the four keys form 
a quartet. 

Due to a slow diffusion in the backward direction, we can compute some values 
in VK’ even for small 1 (Table). Hence given k! i,j for K4 we can partly compute 
Kpg, Kc and Kp, which provides additional filtration in the attack. 


Internal State. The plaintext difference is specified in 10 bytes Mad, the dif- 
ference in the other six bytes not restricted. The three active S-boxes in rounds 
2—4 are passed with probability 2~° each. In round 6 (the switching round) we 
ask for the fixed difference only in af, the other two S-boxes can output any 
difference such that it is the same as in the second related-key pair. Therefore, 
the amplified probability of round 6 equals to 278723-5 = 2713, We switch be- 
tween the two trails before the key addition in round 6 in all bytes except bÊ a, 
where we switch after the S-box application in round 7 (the Ladder switch). This 
trick allows us not to take into account the only active S-box in the lower trail 
in round 7. The overall probability of the rounds 3-6 is 2738713 = 27°81, 

The lower trail has 8 active S-boxes in rounds 8-12. Only the first four active 
S-boxes are restricted in the output difference, which gives us probability 2~?4 
for the lower trail. The ciphertext difference is fully specified in the middle two 
rows, and has 35 bits of entropy in the other bytes. More precisely, each Vco, is 
taken from a set of size 2’, and all the Vcs,. should be the same on both sides 
of the boomerang and again should belong to a set of size 2’. Therefore, the 
ciphertext difference gives us a 93-bit filter. 


6.2 The Attack 


We compose 273 structures of type with 248 texts each. Then we encrypt 


all the texts with the keys K4 and Kc, and their complements w.r.t. AP on 
Kpg and Kp. We keep all the data in memory and analyze it with the following 
procedure: 


1. Compose all candidate plaintext pairs for the key pairs (K4, Kg) and 
(Kc, Kp). 
2. Compose and store all the candidate quartets of the ciphertexts. 
3. For each guess of the subkey bytes: k§.3, k93, and kj; in Ka; kj; in Ka 
and Kpg: 
(a) Derive values for these bytes in all the keys from the differential trail. 
Derive the yet unknown key differences in AK? and VK®. 
(b) Filter out candidate quartets that contradict VK®. 
(c) Prepare counters for the yet unknown subkey bytes that correspond to 
active S-boxes in the first two rounds and in the last round: kj 9, k61, 
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k? 5, k§.9 — in keys Ka and Ko, kô,o, ki, k6,25 6,3 — in keys K4 and 
Kp, i.e. 16 bytes in total. 

(d) For each candidate quartet derive possible values for these unknown 
bytes and increase the counters. 

(e) Pick the group of 16 subkey bytes with the maximal number of votes. 

(£) Try all possible values of the yet unknown 9 key bytes in K? and check 
whether it is the right key. If not then go to the first step. 


Right quartets. Let us first count the number of right quartets in the data. 
Evidently, there exist 2'?8 pairs of internal states with the difference AA?. 
The inverse application of 1.5 rounds maps these pairs into structures that we 
have defined, with 248 pairs per structure. Therefore, each structure has 248 
pairs that pass 1.5 rounds, and 27° structures have 2!?! pairs. Of these pairs 
2(121-31):2—128 _ 952 right quartets can be composed after the switch in the 
middle. Of these quartets 252-774 = 16 right quartets come out of the last round. 

Now we briefly describe the attack. Full details will be published in the ex- 
tended version. In steps 1 and 2 we compose 215? candidate quartets. The guess 
of five key bytes gives a 32-bit filter in step 3, so we leave with 212 candidate 
quartets, which are divided according to Vcz,9 into 214 groups. Then we perform 
key ranking in each group and recover 16 more key bytes. The exhaustive search 
for the remaining 9 key bytes can be done with the complexity 272. The overall 
time complexity is about 2176, and the data complexity is 2123. 


7 Conclusions 


We presented related-key boomerang attacks on the full AES-192 and the full 
AES-256. The differential trails for the attacks are based on the idea of finding 
local collisions in the block cipher. We showed that optimal key-schedule trails 
should be based on low-weight codewords in the key schedule. We also exploit 
various boomerang-switching techniques, which help us to gain free rounds in 
the middle of the cipher. However, both our attacks are still mainly of theoretical 
interest and do not present a threat to practical applications using AES. 
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Disclaimer on colors. We intensively use colors in our figures in order to 
provide better understanding on the trail construction. In figures, different colors 
refer to different values, which is hard to depict in black and white. However, 
we also list all the trail differences in the tables, so all the color information is 
actually dubbed. 


Trail details. By AA’ we denote the upper trail difference in the internal state 
after the S-box layer, and by VA’ the same for the lower trail. 
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Abstract. In this paper, we formalize an attack scheme using the key- 
dependent property, called key-dependent attack. In this attack, the in- 
termediate value, whose distribution is key-dependent, is considered. The 
attack determines whether a key is right by conducting statistical hy- 
pothesis test of the intermediate value. The time and data complexity of 
the key-dependent attack is also discussed. 

We also apply key-dependent attack on reduced-round IDEA. This 
attack is based on the key-dependent distribution of certain items in 
Biryukov-Demirci Equation. The attack on 5.5-round variant of IDEA 
requires 27! chosen plaintexts and 2?! encryptions. The attack on 
6-round variant requires 2*° chosen plaintexts and 2'!?! encryptions. 
Compared with the previous attacks, the key-dependent attacks on 5.5- 
round and 6-round IDEA have the lowest time and data complexity, 
respectively. 


Keywords: Block Cipher, Key-Dependent Attack, IDEA. 


1 Introduction 


In current cryptanalysis on block ciphers, widespread attacks use special proba- 
bility distributions of certain intermediate values. These probability distributions 
are considered as invariant under different keys used. For example, differential 
cryptanalysis [7] makes use of the probability of the intermediate differential 
with high probability. Its value is assumed not to vary remarkably with different 
keys. Linear cryptanalysis [23] is based on the bias of the linear approximation, 
which is also generally constant for different keys. 

Instead of concentrating on the probability distribution which is invariant for 
different keys, Ben-Aroya and Biham first proposed the key-dependent prop- 
erty in B]. Key-dependent property means that the probability distribution of 
intermediate value varies for different keys. In Ø], an attack on Lucifer using 
key-dependent differential was presented. Knudsen and Rijmen also used similar 
idea to attack DFC in BO. 


* This work was supported by NSFC Grant No.60573032, 60773092 and 11th PRP of 
Shanghai Jiao Tong University. 
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In this paper, we consider the key-dependent property further. The distribu- 
tion of intermediate value which is key-dependent is called key-dependent dis- 
tribution. Assume that there are some randomly chosen encryptions. For the 
intermediate values calculated from these encryptions with the actual key, they 
should conform to key-dependent distribution. On the other hand, if we use a 
wrong key to calculate the intermediate values, they are assumed to conform 
to random distribution. Basing on key-dependent distribution, we formalize a 
scheme of discovering the actual key by performing statistical hypothesis test 
[L7 on possible keys, and we call this scheme key-dependent attack. For a given 
key, the null hypothesis of the test is that the intermediate value conforms to the 
key-dependent distribution determined by the key. The samples of the test are 
the intermediate values calculated from a few encryptions. If the test is passed, 
the given key is concluded to be the actual key, otherwise it is discarded. For the 
keys that share the same key-dependent distribution and the same intermediate 
value calculation, the corresponding hypothesis tests can be merged to reduce 
the time needed. By this criterion, the whole key space is divided into several 
key-dependent subsets. 

Due to the scheme of the key-dependent attack, the time complexity of the 
attack is determined by the time for distinguishing between the random dis- 
tribution and the key-dependent distribution. The time needed relies on the 
entropy of the key-dependent distribution: the closer the key-dependent distri- 
bution is to the random distribution, the more encryptions are needed. For each 
key-dependent subset, the number of encryptions and the criteria of rejecting 
hypothesis can be chosen so that the attack on this subset is optimized. The 
expected time of the attack on each subset is also obtained. 

The total expected time complexity can be calculated from the expected time 
on each key-dependent subset. Different orders of the key-dependent subsets 
attacked have different expected time complexities. The order with minimal 
expected time complexity is presented. The total expected time complexity is 
also minimized in this way if the actual key is supposed to be chosen uniformly 
from the whole key space. 

This paper also presents a key-dependent attack on block cipher 
IDEA. The block cipher IDEA (International Data Encryption Algorithm) 
was proposed in BIJ]. The cryptanalysis of IDEA was discussed in 
DSTA ICISIONT TT OT ST 47 SUT GIT SUT 9924995), and no attack on full version IDEA 
is faster than exhaustive search so far. We investigate the Biryukov-Demirci 
Equation, which is widely used in recent attacks on IDEA BLIGE. We 
find that particular items of Biryukov-Demirci Equation satisfy key-dependent 
distribution under some specific constraints. This makes it possible to perform 
the key-dependent attack on IDEA. Biryukov-Demirci Equation is used to re- 
cover the intermediate values from encryptions. 

Our key-dependent attack on 5.5-round variant of IDEA requires 27! chosen 
plaintexts and has a time complexity of 2112-1 encryptions. Our key-dependent 
attack on the 6-round variant of IDEA requires 24° chosen plaintexts and has 
a time complexity of 2112-1 encryptions. These attacks use both fewer chosen 
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Table 1. Selected Results of attacks on IDEA 


Rounds Attack type Data Time Ref. 
4. Impossible Differential OCP ge 
4.5 Linear 16 CP ghee 
5t Meet-in-the-Middle 274 CP gre IE] 
5t Meet-in-the-Middle PARQE Ors o 
5 Linear 2185 KP Pa 0 
5 Linear 2'° KP 3103 
5 Linear 16 KP go 
5.5 Higher-Order Differential-Linear 2°? CP 2126-85 0 
6 Higher-Order Differential-Linear 2°* — 2°? KP 21°68 0 
5 Key-Dependent 27 OP 25 Section Bal 
5t Key-Dependent 26t KP 25:3 Section Bal 
5.5 Key-Dependent 271 CP 2421 Section EI 
6 Key-Dependent 2° CP 2"! Section EA 


CP - Chosen Plaintext, KP - Known Plaintext. 
İt Attack on IDEA starting from the first round. 


plaintexts and less time than all the previous corresponding attacks. We also 
give two key-dependent attacks on 5-round IDEA starting from the first round. 
One requires 2!” chosen plaintexts and needs 2!*°-° encryptions.The other one 
requires 26+ known plaintexts and needs 2115-3 encryptions.We summarize our 
attacks and previous attacks in Table [I] where the data complexity is measured 
in the number of plaintexts and the time complexity is measured in the number 
of encryptions needed in the attack. 

The paper is organized as follows: In Section J] we give a general view of the 
key-dependent attack. In Section B] we give a brief description of IDEA block 
cipher. In Section A] we show that the probability distribution of some items of 
the Biryukov-Demirci Equation is a key-dependent distribution. In Section hh} we 
present two key-dependent attacks on reduced-round IDEA. Section[§Jconcludes 
this paper. 


2 The Key-Dependent Attack 


In B], Ben-Aroya and Biham first proposed the key-dependent property and im- 
plemented a key-dependent differential attack on Lucifer. Knudsen and Rijmen 
also used similar idea to attack DFC in [20]. 

In this section, we formalize a scheme of identifying the actual key using the 
following key-dependent property (with high success probability). 


Definition 1. For a block cipher, if the probability distribution of an interme- 
diate value varies for different keys under some specific constraints, then this 
probability distribution is defined as key-dependent distribution. 
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Consider some randomly chosen encryptions satisfying the specific constraints. 
If one uses the actual key to calculate the intermediate value, it should conform 
to key-dependent distribution. If one uses a wrong key to calculate the inter- 
mediate value, it is assumed to be randomly distributed. With such a property, 
determining whether a given key is right can be done by distinguishing which 
distribution the intermediate value conforms to, the key-dependent distribution 
or the random distribution. 

We propose an attack scheme, called key-dependent attack, using key-dependent 
distribution. The attack uses statistical hypothesis test, whose idea is also used 
in differential and linear attack [I7], to distinguish between key-dependent dis- 
tribution and random distribution. For a key, the null hypothesis of the test is 
that the intermediate value conforms to the key-dependent distribution deter- 
mined by the key. Then the attack uses some samples to determine whether the 
hypothesis is right. The samples of the statistical hypothesis test are the inter- 
mediate values obtained from the encryptions satisfying the specific constraints. 
If the key passes the hypothesis test, the attack concludes that the key is right, 
otherwise the key is judged to be wrong. 

For the keys that share the same key-dependent distribution and the same in- 
termediate value calculation, the corresponding hypothesis tests can be merged. 
Hence the whole key space is divided into several key-dependent subsets. (Similar 
idea is proposed in [].) 


Definition 2. A key-dependent subset is a tuple (P,U), where P is a fixed key- 
dependent distribution of intermediate value, and U is a set of keys that share the 
same key-dependent distribution P and the same intermediate value calculation. 


Definition 3. The key fraction (f) of a key-dependent subset is the ratio be- 
tween the size of U and the size of the whole key space. 


The key-dependent attack determines which key-dependent subset the actual key 
is in by conducting hypothesis tests on each key-dependent subset. Such process 
on a key-dependent subset (P,U), called individual attack, can be described as 
the following four phases: 


1. Parameter Determining Phase Determine the size of the samples and 
the criteria of rejecting the hypothesis that the intermediate values conform 
to P. 

2. Data Collecting Phase Randomly choose some encryptions according to 
the specific constraints 

3. Judgement Phase Calculate the intermediate values from the collected 
encryptions. If the results satisfy the criteria of rejection, then discard this 
key-dependent subset, otherwise enter the next phase. 

4. Exhaustive Search Phase Exhaustively search U to find the whole key. If 
the exhaustive search does not find the whole actual key, then start another 
individual attack on the next key-dependent subset. 


1 Though each individual attack chooses encryptions randomly, one encryption can be 
used for many individual attacks thus to reduces the total data complexity. 
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The time complexity of the key-dependent attack is determined by the time 
complexity of each individual attack and the order of performing these individual 
attacks. 

For a key-dependent subset (P,U), the time needed for individual attacks re- 
lies on the entropy of P: the closer P is to the random distribution, the more diffi- 
cult the attack is—to ensure the same probability of making the right judgement, 
the attack needs more encryptions. This indicates that individual attacks for 
different key-dependent subsets have different time complexities. The time com- 
plexity of each individual attack is determined by corresponding key-dependent 
distribution P. For each key-dependent subset, the number of encryptions and 
the criteria of rejecting hypothesis are then chosen to minimize the time com- 
plexity of this individual attack. 

To minimize the time complexity of an individual attack, the attack should 
consider the probability of committing two types of errors: Type I error and 
Type II error. Type I error occurs when the hypothesis is rejected for a key- 
dependent subset while in fact the actual key is in U, and the attack will fail to 
find the actual key in this case. The probability of Type I error is also defined as 
significant level, denoted as a. Type II error occurs when the test is passed while 
in fact it is not right, and in this case the attack will come into the exhaustive 
search phase, but will not find the actual key. The probability of Type II error is 
denoted as 3. With a fixed size of samples (denoted as N) and the significance 
level a, the criteria of rejecting the hypothesis is determined, and the probability 
of Type II error ĝ is also fixed. For a fixed size of samples, it is impossible to 
reduce both a and 8 simultaneously. In order to reduce both a and 8, the attack 
has to use a larger size of samples, but time and data complexity will increase. 
Hence, an individual attack needs to balance between the size of samples, and 
the probability of making wrong judgement. 

For a key-dependent subset (P,U), if the actual key is not in this subset, 
the expected time complexity (measured by the number of encryptions) of the 
individual attack on this subset is 


W =N + flU] (1) 


If the actual key is in this subset, the expected time of the individual attack on 
this subset is 

R=N+(1- aH 
Since the time complexity is dominated by attacking on wrong key-dependent 
subsets (there is only one key-dependent subset containing the actual key), the 
attack only needs to minimize the time complexity of the individual attack for 
each wrong key-dependent subset to minimize the total time complexity. Al- 
though a does not appear in Equation (), a affects the success probability of 
the attack, so œ should also be considered. We set one upper bound of a to 
ensure that the success probability is above a fixed value, and then choose such 
size of samples that Equation (I) is minimized, in order to minimize the time 
complexity of individual attacks. 
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In addition, it is entirely possible that some key-dependent distributions is 
so close to random distribution that the expected time for performing hypoth- 
esis tests is longer than directly searching the subsets. For these key-dependent 
subsets, the attack exhaustively searches the subset directly instead of using 
statistical hypothesis test method. 

On the other hand, the time complexity of the key-dependent attack is also 
affected by the order of performing individual attacks on different key-dependent 
subsets. Because the expected time complexities of individual attacks are differ- 
ent, different sequences of performing individual attacks result in different total 
expected time complexity. Assume that a key-dependent attack performs individ- 
ual attacks on m key-dependent subsets in the order of (P;,U1),...,(Pm,Um). 
Let R; denote the expected time for (P;,U;) if the actual key is in U;, and W; 
denote the expected time if the actual key is not in U;. We have following result: 


Theorem 1. The expected time for the whole key-dependent attack is minimal 
if the following condition is satisfied 


fh ia 


a E 
Wo Wo — Wm 


Proof. The expected time of the attack in the order of (P4, U1), .. ., (Pm, Um) is 


@=fi[Ri +a(W2 + W3 +--+ Wm)| + fol[Wi + R2 + a(W3 +--+ Wm))] 
+ f3[Wi + W2 + R3 +a(W4 +... Wm)] +--+ + fm(Wi + W2 +... Wm-1 + Rm) 


=} fiRi Doa W;) + a) fi > W;) 


(2) 
If the attack is performed in the order of (Ps,,Us,), (Psa, Usa), -© ©, (Ps; Usm) 
where s1, 52,...,Sm iS a permutation of 1,2,...,m. The expected time is 
m m i—1 m m 
P= Y fanat d (fod Wes) +0) Uu 2, Wa) 
i=1 i=1 j=1 i=1 j=i+1 


fiW; + af;Wi occurs in & if and only if j < i and occurs in ©’ if and only if 
j < i’ where sy =i and sj = j. Hence 


$-8= So (fiWj+afj;W; — f;Wi — afiW;) 


j<i and j/>i’ 


Since a < 1 and f;W; — f;Wi < 0 for j < i, P — P < 0 for any permutation 
$1, 52,---Sm- 


In the following sections of this paper, we present a concrete key-dependent 
attack on the block cipher IDEA. 
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3 The IDEA Block Cipher 


In this section, we give a brief introduction of IDEA and notations used later in 
this paper. 

IDEA block cipher encrypts a 64-bit plaintext with a 128-bit key by an 8.5- 
round encryption. The fifty-two 16-bit subkeys are generated from the 128- 
bit key Z by key-schedule algorithm. The subkeys are generated in the order 
Zi ZS ys Z6 Zi yes Zey Zy 524. The key Z is partitioned into eight 16-bit words 
which are used as the first eight subkeys. The key Z is then cyclically shifted to 
the left by 25 bits, and then generate the following eight subkeys. This process 
is repeated until all the subkeys are obtained. In Table R] the correspondence 
between the subkeys and the key Z is directly given. 

The block cipher partitions the 64-bit plaintext into four 16-bit words and 
uses three different group operations on pairs of 16-bit words: exclusive OR, 
denoted by ©; modular addition 216, denoted by H and modular multiplication 
216 + 1(0 is treated as 216), denoted by ©. 

As Figure[]] each round of IDEA contains three layers: KA layer, MA layer and 
Permutation layer. We denote the 64-bit input ofround i by X* = (X$, X4, X4, XÍ). 
In the KA layer, the first and the fourth words are modular multiplied with Z$ and 
Zi, respectively. The second and the third words are modular added with Z¿ and 
Z4 respectively. The output of the KA layer is denoted by Y* = (Yt, Yå, Y#, Y#). 

In the MA layer, two intermediate values p = Y;' @ Y} and q' = Y} © Yj are 
computed first. These two values are processed to give ut and ft’, 


ul = (pP © Z5) Bt" 


t = ((p' © Zi) B g) © Zg 
We denote s’ the intermediate value p’ © Zi for convenience. The output of the 
MA layer is then permutated to give the output of this round (Y/ 6 ut, Y$ @ 
ut, Yi @t', Yj Gt’), which is also the input of round i+ 1, denoted by (Xit, X37", 
x age Xj"). The complete diffusion, which means every bit of (X}t!, X3+1, X a 
X+!) is affected by every bit of (Yi, ¥3, Yå, Yj), is obtained in the MA layer. 


Table 2. The Key-Schedule of IDEA 


zZ aa ZR Z% Zi 

0-15 16-31 3247 48-63 6479 80-95 
96-111 112-127 25-40 41-56 57-72 73-88 
89-104 105-120 121-8 9-24 50-65 66-81 
82-97 98-113 1141 217 1833 34-49 
75-90 91-106 107-122 123-10 11-26 27-42 
43-58 59-74 100-115 116-3 419 20-35 
36-51 52-67 68-83 84-99 125-12 13-28 
29-44 45-60 61-76 77-92 93-108 109-124 
22-37 38-53 54-69 70-85 


1 
2 
3 
4 
5 
6 
7 
8 
9 
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i+1 i+1 itl +1 
xX x x: x 


Fig. 1. Round i of IDEA 


In this paper, we will use P = (Pi, Po, P3, P4) and P’ = (Pi, P}, P4, Pj) to de- 
note a pair of plaintexts, where P; and P’ are 16-bit words. C = (C1, C2, C3, C4) 
and C” = (C1, C4, C4, C4) are their ciphertexts respectively. We also use the sym- 
bol’ to distinguish the intermediate values corresponding to P’ from to P. For 
example, s’ is obtained from plaintext P and P’ will generate s”. The notation 
A will denote the XOR difference, for instance, As’ is equal to sê @ s^. 


4 The Key-Dependent Distribution of IDEA 


In this section, we describe the key-dependent distribution of the block cipher 
IDEA, which will be used in our attack later. The notations used are the same 
as in [E]. 

The Biryukov-Demirci relation was first proposed by Biryukov [6] and 
Demirci [3]. Many papers have discussed attacking on IDEA using this re- 
lation, such as [JSIGI3UIGITS). The relation can be written in following form 
(LSB denotes the least significant bit) 
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LSB(C2 @ C3) =LSB(P2 © P38 Z1 9 Zi o s! 0 Z2 0273088" 
Z3 o Z? o s o Zi o Zo sto ZoZo 
ZÉ œ ZÍ @ sê @ ZI @ Z @ s" @ Z3 o Z8 o s8 
Z3 ® Z3) 


(3) 


It is shown in that, for two pairs of plaintext and ciphertext (P,C) and 
(P’,C’), XOR their corresponding Biryukov-Demirci relation, we will obtain 
from Equation (B) 


LSB(C2 ® C3 ® C} © Ch) =LSB(Po © P3 © P} © P} @ As! © As? 


4 
p As? @ Ast © As’ © As® © As’ @ As?) 4) 


We call Equation () Biryukov-Demirci Equation. 
The following theorem shows that the probability distribution of LS B(As‘) 
in Biryukov-Demirci Equation is a key-dependent distribution. 


Theorem 2. Consider round i of IDEA. If one pair of intermediate value (pt, p°) 
satisfies Ap’ = 8000,, then the probability of LS B(As') = LSB(8000, © Zi) is 


Prob(LSB(As‘) = LSB(8000, © Z5)) = i 3) 


where W is the set of all such 16-bit words w that 1 < w < 8000, and that 
(w * Z$) + (8000, * Z$) < 216 +1 
where * is defined as 


a@bifaob40 
a x b= {3 pine 
Proof. Consider every intermediate pair (pt, p°) which satisfies Ap’ = 8000,, 
excluding (0, 8000+). We have p = p'+8000, or pê = p+8000,. Without losing 
generality, assume p = p'+8000,, where 1 < pê < 8000, and 8000, < p” < 216, 
If we consider only the least significant bit, LSB(s*) = LSB(p' * Z$). The 
following equations also hold 


i i (6) 
p’ + 8000,) * Zé) 


(p? * Z$) + (8000, * Zi)) (mod 2*6 + 1)) 


mn ~~ 


In the special case when (pt, p’) is (0,8000,), let p? = 8000,, and p* = 0. The 
Equations (@) also holds, because p” = 0 is actually treated as 216 for inputs of 
© and x. 
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The key-dependent distribution 


Prob(LSB(A sî) 


0.7 


The random distribution 


L 1 i L i i 
0 10000 20000 30000 40000 50000 60000 65535 


Fig. 2. The key-dependent distribution of Prob(LSB(As) = 1) on the value of Z$ 


If (p° x ZŁ) + (8000, * Z2) is smaller than 216 +1, then LS B(s") = LSB(s') @ 
LS'B(8000,,* Z{) holds because of the equivalence of XOR and modular addition 
for the least significant bit. Moreover, LS B(As’) = LS B(8000,* Z2) is satisfied, 
which means LS B(As‘) = LSB(8000, © Z£) 

Otherwise, LS B(s") is equal to LS'.B(s') 6 LSB(8000, * Zi) 6 1 because of 
the carry. So LS B(As*) equals to LSB(8000, © Z$) @ 1. 

Therefore, we may conclude that LSB(As*) = LS B(8000, © Z$) if and only 
if the pair (p’, p°) satisfies (w * Z$) + (8000, * Z$) < 216 +1, where w is either p’ 
or p, whichever between 1 and 8000,. And there are at most 215 such w, hence 
Equation (@ holds. This completes the proof. 


Remark 1. Figure P] plots the relation between the subkey Z and the proba- 
bility of LSB(As') = 1. As shown in Figure P] for most Z!, the probability of 
LSB(As') = 1 is different from random distribution. Hence, it is possible to 
perform key-dependent attack on IDEA using this key-dependent distribution. 
For most Z!, there are general four cases for the probability of LS B(As*) = 1 
as Zi grows from 0 to 216 — 1, which can be roughly approximated as following: 


55 last two bits of Zi = 00 
vi : r; 
; 5-52 last t ts of Z = 
POmeB Asiana? 2 eee (7) 
1.0 — 3 last two bits of Z = 10 


0.5+ 53 last two bits of Zý = 11 


oO 
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From Equation (Ø, following approximation also holds for most Zé 


i 


Zs i 
ae LSB(Zi) =0 
1 


min{Prob(LSB As‘) = 0), Prob(LSB(As*) = 1)} & 7 : 
(LSB(As") = 0) (LSB(As*) = 1)} p LSB(Zi)- 


— 
CO 
ent 


Calculation shows that, for only 219 out of all 21° possible Z¿, the difference 
between the approximation (Equation ( or ()) and the accurate provability is 
larger than 0.01. 

Equation (B) indicates that we can approximate left hand side of Equation (B) 
by fixing several most significant bits and the least significant bit. In following 
sections, we will show that we only need to distinguish the approximate probabil- 
ity distribution from random distribution. Hence, for most Z!, this approxima- 
tion is close enough to the accurate value. For Z! that can not be approximated 
in this way, we use other methods to deal with this situation. 


5 The Key-Dependent Attack on IDEA 


In this section, we will present two key-dependent attacks on reduced-round 
IDEA. In Section B2]] we will give a basic attack on the 5.5-round variant of 
IDEA and then extend it to 6-round variant in Section 5.2] We also give two key- 
dependent attacks on 5-round IDEA starting from the first round in Section E3} 


5.1 The Attack on 5.5-Round Variant of IDEA 


We first present one key-dependent attack on the 5.5-round variant of IDEA. 
The attack starts from the third round and ends before the MA layer of the 
eighth round. The main idea of this attack is to perform key-dependent attack 
based on the key-dependent distribution of Ast described in Theorem Ø] 

Consider the 5.5-round variant of IDEA starting from the third round, the 
Biryukov-Demirci Equation can be rewritten as 


LSB(As*) = LSB(P2® P3® P,® P3 BC2OC3OC,0C30 As? @ As’ @ As®@ As’) 
(9) 
Where P and P’ are equivalent to X? and X’°, C and C” are equivalent to Y® 
and Y’® by the variant of IDEA. 
We first construct a pair of plaintexts satisfying the specific constraint Ap* = 
80002. The construction is based on the following lemma. 


Lemma 1. For any a, if two 16-bit words x and x' have the same least 15 
significant bits, then 


e xGa and x'a have the same least 15 significant bits, 
e «Ha and x' Ha have the same least 15 significant bits. 


Based on Lemma [I the following proposition can be obtained. 


30 X. Sun and X. Lai 


Proposition 1. Ifa pair of intermediate values Y? and Y”? satisfy the following 
conditions: 


a. AYP = ot eee) 
b. AY? = 8000, 
c. Y e Y? = Y @ Yf 


then As? = 0 and the probability of LSB(As*) = 0 can be determined by 
Equation @). 


Proof. From Condition (a), AY? = AY; = 0, p? is equal to p’. Then As? = 0 
is quite straightforward. 

From Condition (c), q? is equal to q”. If p? and q? are fixed, u® and t? are 
also fixed with respect to any Z3 and Z@. It indicates that XÍ = Y’ @u? = X14. 
Note that Y/! and Y/* are the results of modular-multiplying X} and X/{* with 
the same Z}, hence Yf is equal to Yj*. 

On the other hand, Ay? = 8000, means that the least significant 15 bits of 
Y3 are equal to those of Y33 and the most significant bit of Y? and that of Y}? are 
different. Because u’ is fixed, by Lemma [I] the least significant 15 bits of X$ are 
equal to those of X44. Then AX# is equal to 8000, and AY;! = 8000, is obtained 
by modular addition with the same Zł. From AY;' = 0 and AY;' = 8000,, Ap* 
is 8000,. By Theorem B] the conclusion is obtained. 


3 


In our attack, we use the plaintext pairs satisfying Proposition I] We obtain 
Condition (a) by letting AP, = AP; = 0. By Lemma B] P> and P} are fixed 
to have the same least significant 15 bits, and hence AY, = 8000,. In order to 
fulfill Condition (c), we have to guess Z3 and then according to this guess, to 
choose P4 and Pj which satisfy AY? = 8000,. 

By Proposition [] As? is equal to zero. In order to get the right hand side 
of Equation (Q), we still need to get As°, As®, As’. We need to guess Z$, Z$, 
Z$, Z8, Z, Z1, 23, Zł, Zl Zi, Zł, Z$, Z8, Z8, Z$. As shown in É], one can 
partially decrypt one pair of encryptions using these 15 subkeys to calculate the 
values of As°, As, As’. These 15 subkeys only take key bits 125-99 and also 
cover the subkey Z}. Hence, for one guessed 103 key bits, we can calculate the 
value of As* from a special pair of encryptions. 

We also note that these 103 bits also cover the key Z$, which determine the 
key-dependent distribution on As* according to Theorem P] Therefore, we can 
perform the key-dependent attack on 5.5-round variant of IDEA. As described 
in Section 2, the key space can be divided into 2!°? key-dependent subsets by 
the 103 key bits, each contains 27° keys. 

For a key-dependent subset (P, U), let p denote the probability of LS B(As*)= 
LSB(8000, © Z$). For simplicity, in the following analysis, we assume that p < 
0.5, the case when p > 0.5 is similar. Assume the size of the samples is n pairs of 
encryptions that satisfy the specific constraint on this key-dependent subset, and 
t of them satisfy LSB(As*) = LSB(8000, © Z$). The criteria for not rejecting 
the hypothesis is that t is smaller or equal to a fixed value k. The probability of 
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Type I error is 


Type II error is 
k 
n 
= 0.5” 
re) 


If (P,U) is a wrong key-dependent subset, the expected time complexity of 
checking this subset is 


W = 2n+2752 (10) 


As shown in Section 2, the attack sets a smaller than or equal to 0.01 to ensure 
that the probability of the false rejection will not exceed 0.01. Under this pre- 
condition, the attack chooses n and ĝ so that a < 0.01 and minimizes Equation 
( to minimize the time complexity on each key-dependent subset (P, U). By 
Section B] we minimize the total expected time complexity with this method. 
Because this choice is related only to the key Z$, so we only need to get n and 
k for 216 different values. 

For example, for a key-dependent subset (P, U) with Z4 = 8001+, p is about 
0.666687. The attack checks every possible n and k to find the minimized ex- 
pected time complexity of the individual attack for this subset. As shown in 
Section 2, the expected time complexity for each subset is upper bounded by ex- 
haustive search on the subset, which is 2?° in this attack. Hence, the attack only 


Encryptions 


Expected time complexity for the individual attack (w) 


Number of encryptions used 


L L 
0 2101 2102 3*2101 103 
i-th Individual Attack (with the Performing Order) 


Fig. 3. The number of encryptions used and expected time complexity for individual 
attacks 
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checks all the n and k smaller than 27°. The expected time is minimized with 
precondition œ < 0.01 when n = 425 and k = 164. In this case, a = 0.009970, 
6 = 0.000001 and W = 899.094678. 

Since all the key-dependent subsets have the same key fraction, the order 
of performing individual attacks with minimal expected time complexity be- 
comes the ascending order of W for all key-dependent subsets due to Theorem[] 
Figure BJ plots the number of encryptions used and expected time complexity for 
all the individual attacks. 

The total expected time complexity of the attack, described as Equation (QQ), 
becomes 


f= YAR +l ZW) tab fi 5 W;) 


i=1 j=i+1 
2103 ma 1 2103 9103 
=l D a a a 
tly t=1 j=i4+1 
2103 2103 54 92103 9103 
<a ee W 
w=1 j=1 i=1 j=i+1 
1 2103 
=e -276 4 XO (21° — i + 0.01i)W;) 
i=1 
x21121 


with 99% success probability if the attack chooses n and £ for each key-dependent 
set and determines the order of performing individual attacks as shown above. 
The number of pairs needed in one test is about 21° in the worst case. The attack 
uses a set of 2?! plaintexts, which can provide 220 plaintext pairs satisfying the 
conditions in Proposition 1 for each key-dependent subset. 


The attack is summarized as follows: 


1. For every possible Ze, calculate the corresponding number of plaintext pairs 
needed n and the criteria of not rejecting the hypothesis k. 

2. Suppose S is an empty set. Randomly enumerate a 16-bit word s, insert s 
and s @ 8000, into the set S. Repeat this enumeration until set S contains 
2° different words. Ask for the encryption of all the plaintexts of the form 
(A, B,C, D), where A and C are fixed to two arbitrary constants, B takes 
all the values in S and D takes all the 16-bit possible values. 


3. Enumerate the key-dependent sets in ascending order of W: 

(a) Randomly choose a set of plaintext pairs with cardinality n from the 
known encryptions. The plaintext pairs must satisfy the requirements of 
Proposition JJ 

(b) Partially decrypt all the selected encryption pairs and count the occur- 
rence of LS B(As,) = 1. 

(c) Test the hypothesis. If the hypothesis is not rejected, perform exhaustive 
search for the remaining 25 key bits. 
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5.2 The Attack on 6-Round Variant of IDEA 


We now extend the 5.5-round attack to an attack on the 6-round variant of 
IDEA starting before the MA layer of the second round. The data complexity 
of the attack is 249 and the time complexity is 2112-1, 

As shown in Øl, Z? and Z$ are included in the 103 key bits in the 5.5- 
round attack. Hence, we can add this half round to the 5.5-round attack without 
enlarging the time complexity. 

It is more difficult to construct right plaintext pairs satisfying Proposition [H 
Consider a pair of intermediate values X? and X’? before the third round, which 
satisfy Proposition 1. If we partially decrypt X°? and X”? using any possible Z2 
and ZZ, the only fact we know is that all the results have the same XOR. of 
the first and third words. The attack hence selects all the plaintexts P where 
the least 15 significant bits of P, @ P; are fixed to an arbitrary 15-bit constant. 
The total number of selected plaintexts is 24°. It is possible to provide 248 plain- 
text pairs satisfying the conditions in Proposition 1 in the test for any Z$, Zê 
and Z3. This number is sufficient in any situation. 


5.3 Two Key-Dependent Attacks on 5-Round IDEA Starting from 
the First Round 


We apply the key-dependent attack to the 5-round IDEA starting from the first 
round. Biryukov-Demirci Equation is reduced to 


LSB(As?) = LSB(P> © P3 © P © P3® C2 
DC3 © C3 @ Cy © Ast © As? © As* @ As?) 


(11) 


We choose the plaintext pairs to satisfy Proposition [I] before the first round 
by guessing Z}, and then As! is equal to 0 as shown in Section ȘI] In order 
to determine the right hand side of Equation (II), we need to know Z3, Z?, 
Z3, Z3, 2g, 27, Z3, Z3, Zj, ZŠ, ZZ. These 12 subkeys take the bits 75-65 
from key Z. These 119 bits only cover the most significant nine bits of Z2, which 
determines the probability distribution of LS B(As?). It is not necessary to guess 
the complete subkey Z2. The attack continues to guess the least significant 
bit of Z2(the 72nd bit of Z), and estimates the probability of LSB(As?) = 1 
by Remark [I instead. Hence, the attack divides the key space into 2!?° key- 
dependent subsets by the 120 key bits, and performs the individual attacks on 
each key-dependent subset. The attack uses statistical hypothesis test method 
to determine which subset the actual key is in. For the subkeys Z? of which 
Prob(LSB(As?) = 1) can not be approximated by Remark[Jas shown in Section 
the attack exhaustively searches the remaining key bits. 

In this attack, it is possible that the expected time of individual attacks 
are larger than exhaustively search directly for some key-dependent subsets, 
which means 

2n + 8-25 > 2° 


Under this condition, the attack also uses exhaustive key search to determine the 
remaining eight key bits to make sure the time needed not exceed exhaustive search. 
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This attack also choose a < 0.01 to ensure that the attack successes with 99% 
probability. In this case, the total expected time complexity is 2!2°-° encryptions. 

Our experiment shows that the attack needs at most 75 pairs of encryptions for 
one test. We ask for 217 encryptions which can provide 2!° pairs of encryptions, 
which is sufficient for the test. This data complexity(2!”) is the least out of all 
the known attacks on the 5-round IDEA starting from the first round. 

In the second attack, we try to obtain the plaintext pairs satisfying Proposition 
[before the second round. In order to determine LS B(As*), we need to know 
the least significant bits of As!, As?, As* and As*. Hence, the subkeys we need 
to know are Z}, Z}, Z4, ZŁ, Zł, Zł, Z2, ZŠ. Z4, Z?, Z3, ZŠ and ZŠ. These 13 
subkeys only cover 107 bits of key Z(0-106). For every guessed 107 key bits, we 
use similar technique as before. The expected time complexity is 2115-3, which 
is the least time complexity out of all the known attacks on the 5-round IDEA 
starting from the first round. 

Because it is not possible to predict the plaintext pairs which produces the 
intermediate pairs satisfying Proposition [I]before the second round, the encryp- 
tions of all the 264 plaintexts are required. 


6 Conclusions 


In this paper, we formalized a scheme of identifying the actual key using the 
key-dependent distribution, called key-dependent attack. How to minimize the 
time complexity of the key-dependent attack was also discussed. With the key- 
dependent attack, we could improve known cryptanalysis results and obtain 
more powerful attacks. We presented two key-dependent attacks on IDEA. Our 
attack on 5.5-round and 6-round variant of IDEA has the least time and data 
complexities compared with the previous attacks. 

We only implemented a tentative exploration of the key-dependent distribution. 
How to make full use of the key-dependent distribution, especially how to use the 
key-dependent distribution to improve existing attacks, is worth further studying. 

The attack on IDEA makes use of the relation between XOR, modular ad- 
dition and modular multiplication. We believe that the operation XOR, and 
modular multiplication have more properties that can be explored further [IQ]. 
Similar relations among other operations are also valuable to research. The way 
of making full use of the Biryukov-Demirci Equation to improve attacks on IDEA 
is also interesting. 
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Abstract. The security of cascade blockcipher encryption is an impor- 
tant and well-studied problem in theoretical cryptography with practical 
implications. It is well-known that double encryption improves the secu- 
rity only marginally, leaving triple encryption as the shortest reasonable 
cascade. In a recent paper, Bellare and Rogaway showed that in the 
ideal cipher model, triple encryption is significantly more secure than 
single and double encryption, stating the security of longer cascades as 
an open question. 

In this paper, we propose a new lemma on the indistinguishability of 
systems extending Maurer’s theory of random systems. In addition to 
being of independent interest, it allows us to compactly rephrase Bellare 
and Rogaway’s proof strategy in this framework, thus making the argu- 
ment more abstract and hence easy to follow. As a result, this allows 
us to address the security of longer cascades. Our result implies that 
for blockciphers with smaller key space than message space (e.g. DES), 
longer cascades improve the security of the encryption up to a certain 
limit. This partially answers the open question mentioned above. 


Keywords: cascade encryption, ideal cipher model, random system, 
indistinguishability. 


1 Introduction 


The cascade encryption is a simple and practical construction used to enlarge the 
key space of a blockcipher without the need to switch to a new algorithm. Instead 
of applying the blockcipher only once, it is applied l times with | independently 
chosen keys. A prominent and widely used example of this construction is the 
Triple DES encryption U3). 

Many results investigating the power of the cascade construction have been 
published. It is well-known that double encryption does not significantly improve 
the security over single encryption due to the meet-in-the-middle attack [7]. The 
marginal security gain achieved by double encryption was described in [I]. Even 
and Goldreich show that a cascade of ciphers is at least as strong as the 
strongest of the ciphers against attacks that are restricted to operating on full 
blocks. In contrast, Maurer and Massey [LI] show that for the most general 
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attack model, where it is for example possible that an attacker might obtain 
only half the ciphertext block for a chosen message block, the cascade is only at 
least as strong as the first cipher of the cascade. 

In a recent paper H], Bellare and Rogaway have claimed a lower bound on the 
security of triple encryption in the ideal cipher model. Their bound implies that 
for a blockcipher with key length k and block length n, triple encryption is indis- 
tinguishable from a random permutation as long as the distinguisher is allowed 
to make not more than roughly gk+3 min{n,k} queries. This bound is significantly 
higher than the known upper bound on the security of single and double encryp- 
tion, proving that triple encryption is the shortest cascade that provides a rea- 
sonable security improvement over single encryption. Since a longer cascade is at 
least as secure as a shorter one, their bound applies also to longer cascades. They 
formulate as an interesting open problem to determine whether the security im- 
proves with the length of the cascade also for lengths l > 3. However, the proof in 
contains a few bugs, which we describe in the appendix of this paper. The first 
part of our contribution is to fix these errors and to reestablish the lower bound 
on the security of triple encryption up to a constant factor. 

Second, we have rephrased the proof into the random systems framework in- 
troduced in [0]. Our goal here is to simplify the proof and express it on the most 
abstract level possible, thus making the main line of reasoning easy to follow and 
clearly separated from the two technical arguments required. To achieve this, we 
extend the random systems framework by a new lemma. This lemma is a general- 
ization of both Lemma 7 from [I0] and hence also of its special case for the game- 
playing scenario, the Fundamental lemma of game-playing. This was introduced 
in Ø and subsequently used as an important tool in the game-playing proofs (see 
for example [[5§3B5]). We illustrate the use of this new lemma in our proof of the 
security of cascade encryption. Apart from the simplification, this also gives us 
an improvement of the result by a constant factor. 

Finally, our reformulation makes it natural to consider also the security of 
longer cascades. The lower bound we prove improves with the length of the cas- 
cade L for all blockciphers where k < n and for moderate values of l. With increas- 
ing cascade length, the bound approaches very roughly the value 24t™min{n/2,k} 
(the exact formula can be found in Theorem[]). The condition k < n is satisfied 
for example for the DES blockcipher, where the length of the key is 56 bits and 
the length of one block is 64 bits. For these parameters, the result from [4] that 
we reestablish proves that the triple encryption is secure up to 278 queries, but 
our result shows that a cascade of length 5 is secure up to 283 queries. The larger 
the difference n — k, the more a longer cascade can help. This partially answers 
the open question from [J]. 


2 Preliminaries 
2.1 Basic Notation 


Throughout the paper, we denote sets by calligraphic letters (e.g. S). For a fi- 
nite set S, we denote by |S| the number of its elements. A k-tuple is denoted as 
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uf = (u,...,ux), and the set of all k-tuples of elements of U is denoted as U*. 
The composition of mappings is interpreted from left to right, i.e., f o g denotes 
the mapping g(f(-)). The set of all permutations of {0, 1}” is denoted by Perm(n) 
and id represents the identity mapping, if the domain is implicitly given. The no- 
tation x% represents the falling factorial power, i.e., c+ = a(x—1)---(a—n+1). 
The symbol peou(n, k) denotes the probability that k independent random vari- 
ables with uniform distribution over a set of size n contain a collision, i.e., that 
they are not all distinct. It is well-known that peou(n,k) < k?/2n. By CS(-) 
we shall denote the set of all cyclic shifts of a given tuple, in other words, 
CS(m1, Rore ees Tr) = A (Tis 12,---, Wr), (72, 73,---, Tr, Mi) yere (Mr, T1,---; Tr—1)}- 

We usually denote random variables and concrete values they can take on by 
capital and small letters, respectively. For events A and B and random variables 
U and V with ranges U and V, respectively, we denote by Pyajvp the corre- 
sponding conditional probability distribution, seen as a function U x V — (0,1). 
Here the value Py4jļvg(u, v) is well-defined for all u € U and v € V such that 
Pyp(v) > 0 and undefined otherwise. Two probability distributions Py and 
Py on the same set U are equal, denoted Py = Pwu, if Py(u) = Puy (u) for 
all u € U. Conditional probability distributions are equal if the equality holds 
for all arguments for which both of them are defined. To emphasize the ran- 
dom experiment € in consideration, we sometimes write it in the superscript, 
e.g. Pry (u,v): The expected value of the random variable X is denoted by 


E[X] = rex (£: P[X = 2]). The complement of an event A is denoted by A. 


2.2 Random Systems 


In this subsection, we present the basic notions of the random systems frame- 
work, as introduced in [I0], along with some new extensions of the framework. 
The input-output behavior of any discrete system can be described by a random 
system in the spirit of the following definition. 


Definition 1. An (X,))-random system F is a (generally infinite) sequence of 
conditional probability distributions PŪ, xiyi-1 for alli > 1. 


The behavior of the random system is specified by the sequence of conditional 
probabilities Př xiyi (Yi 2", yT) (for i > 1) of obtaining the output y; € V 
on query x; € Æ given the previous i — 1 queries 2*~! = (a1,...,a;-1) € XIT! 
and their corresponding outputs y’~' = (y1,...,y:—1) € J”~*. A random system 
can also be defined by a sequence of conditional probability distributions Při xi 
for i > 1. This description is often convenient, but is not minimal. 

We shall use boldface letters (e.g. F) to denote both a discrete system and 
a random system corresponding to it. This should cause no confusion. We em- 
phasize that although the results of this paper are stated for random systems, 
they hold for arbitrary systems, since the only property of a system that is rel- 
evant here is its input-output behavior. It is reasonable to consider two discrete 
systems equivalent if their input-output behaviors are the same, even if their 


internal structure differs. 
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Definition 2. Two systems F and G are equivalent, denoted F = G, if they 
correspond to the same random system, i.e., if Přjxiyi- = PY pey for 
alli > 1. 


We shall usually define a system (and hence also the corresponding random sys- 
tem) by a description of its internal working, as long as the transition to the 
probability distributions is straightforward. Examples of random systems that 
we consider in the following are the uniform random permutation P : {0,1 }” > 
{0,1}”, which realizes a function randomly chosen from Perm(n); and the ideal 
blockcipher E : {0,1}* x {0,1}” — {0,1}”, which realizes an independent uni- 
formly random permutation for each key K € {0,1}*. In this paper we assume 
that both P and E can be queried in both directions. 

We can define a distinguisher D for an (¥,YV)-random system as a (V, ¥)- 
random system which is one query ahead, i.e., it is defined by the conditional 
probability distributions PX. yi-1yi-1 for all i > 1. In particular, the first query 
of D is determined by PR. After a certain number of queries (say q), the distin- 
guisher outputs a bit W, depending on the transcript (X%,Y%). For a random 
system F and a distinguisher D, let DF be the random experiment where D 
interacts with F. Then for two (¥, Y)-random systems F and G, the distinguish- 
ing advantage of D in distinguishing systems F and G by q queries is defined as 
AD (F, G) = [PPF (W; = 1) — PPS (W, = 1)|. We are usually interested in the 
maximal distinguishing advantage over all such distinguishers, which we denote 
by A,(F,G) = maxp AP (F, G). 

For a random system F, we often consider an internal monotone condition 
defined on it. Such a condition is initially satisfied (true), but once it gets vi- 
olated, it cannot become true again. We characterize such a condition by a se- 
quence of events A = Ao, A1,... such that Ao always holds, and A; holds if the 
condition holds after query i. The probability that a distinguisher D issuing q 
queries makes a monotone condition A fail in the random experiment DF is 
denoted by vP (F, A,) = PPF(A,) and we are again interested in the maximum 
over all distinguishers, denoted by v(F, A,) = maxp vP (F, A,). For a random 
system F with a monotone condition A = Ag, A;,... and a random system 
G, we say that F conditioned on A is equivalent to G, denoted F|A = G, if 
Py xtyi-1a, Z PY xiy for i > 1, for all arguments for which Py xtvi-1a, is 
defined. The following claim was proved in [IQ]. 


Lemma 1. If F|A=G then A,(F,G) < v(F, Ay). 


Let F be a random system with a monotone condition A. Following [M], we 
define F blocked by A to be a new random system that behaves exactly like F 
while the condition A is satisfied. Once A is violated, it only outputs a special 
blocking symbol L not contained in the output alphabet of F. More formally, 
the following mapping is applied to the it output of F: 


Yi lt otherwise. 
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The following new lemma relates the optimal advantage in distinguishing two 
random systems to the optimal advantage in distinguishing their blocked 
counterparts. 


Lemma 2. Let F and G be two random systems with monotone conditions A 
and B defined on them, respectively. Let F+ denote the random system F blocked 
by A and let G+ denote G blocked by B. Then for every distinguisher D we have 
AP (F, G) < A,(F+,G+) + vP (F, Aq). 


Proof. Let D be an arbitrary distinguisher for F and G. Let D’ be a distinguisher 
that works as follows: it simulates D, but whenever it receives an answer L to its 
query, it aborts and outputs 1. Then we have PPS[W, = 1] < PD'G |W, = 1] 
and PPF |W, = 1] < PPE [W = 1] +vP(F,A,). 

First, let us assume that PPS[W, = 1] > PPF[W, = 1]. Then, using the 
definition of advantage and the above inequalities, we get 


A? (F,G) = |[P>S[W, =1]- PPFIW, = 1]| 
= PPSW, = 1] - PPF [w = 1] 
< PPS" w, = 1] — (PPP [Wy = 1] - vy) 
$ A(F+, G*) ap vP(F, Ag), 
which proves the lemma in this case. On the other hand, if PPS[W, = 1] < 
PPFIW, = 1], we can easily construct another distinguisher D* with the same 
behavior as D and the opposite final answer bit. Then we can proceed with 
the argument as before and since AP (F,G) = AD" (F,G) and vP(F,A,) = 
vP” (F, A;), the conclusion is valid also for the distinguisher D. 


Lemma Plis a generalization of both Lemma 7 from and of its special case, 
the Fundamental lemma of game-playing from [4]. Both these lemmas describe 
the special case when A,(F+,G+) = 0, i.e., when the distinguished systems 
behave identically until some conditions are violated. Our lemma is useful in 
the situations where the systems are not identical even while the conditions are 
satisfied, but their behavior is very similar. A good example of such a situation 
is presented in the proof of Theorem [] 

A random system F can be used as a component of a larger system: in par- 
ticular, we shall consider constructions C(-) such that the resulting random 
system C(F) invokes F as a subsystem. We state the following two observations 
about the composition of systems. 


Lemma 3. Let C(-) and C'(-) be two constructions invoking an internal random 
system, and let F and G be random systems. Then 


(i) A,(C(F),C(G)) < Ay (F,G), where q' is the maximum number of invo- 
cations of any internal system H for any sequence of q queries to C(H), 
if such a value is defined. 

(it) There exists a fixed permutation S € Perm(n) (represented by a determin- 
istic stateless system) such that A,(C(P),C’(P)) < A,(C(S), C’(S)). 
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Proof. The first claim comes from [I0], so here we only prove the second one. 
Since the random system P can be seen as a system that picks a permutation 
uniformly at random from Perm(n) and then realizes this permutation, we have: 


AICP), CP) <a DY AUl€(S).C'(8)) 


` S€Perm(n) 


If all the values A,(C(S),C’(S)) were smaller than A,(C(P),C’(P)) it would 
contradict the inequality above, hence there exists a permutation S € Perm(n) 
such that A,(C(P),C’(P)) < A,(C(S), C’(S)). 


2.3 Ideal Blockciphers and Chains 


We introduce some specific notions related to the cascade encryption setting. 
Our terminology follows and extends that in [4]. 

A blockcipher with keyspace {0,1}* and message space {0,1}” is a mapping 
E: {0,1}* x {0,1}” — {0,1}" such that for each K € {0,1}*, E(K,-) is a 
permutation on the set {0,1}”. Typically Ex (a) is written instead of E(K, x) 
and E;,'(-) refers to the inverse of the permutation Ex (-). 

Throughout the paper, we shall work in the ideal blockcipher model, which was 
recently shown to be equivalent to the random oracle model [6]. The ideal block- 
cipher model is widely used to analyze blockcipher constructions (e.g. DADI) 
and consists of the assumption that for each key, the blockcipher realizes an 
independent random permutation. 

A blockcipher can be seen as a directed graph consisting of 2” vertices repre- 
senting the message space and 2”+* edges. Each vertex x has 2" outgoing edges 
pointing to the encryptions of the message x using all possible keys. Each of the 
edges is labeled by the respective key. For a fixed blockcipher Æ, we denote by] 


w(B) = max i | Ex(s) =y} 


the maximal number of distinct keys mapping the plaintext x onto the ciphertext 
y, the maximum taken over all pairs of blocks (x,y). Intuitively, w(£) is the 
weight of the heaviest edge in the graph corresponding to E. This also naturally 
defines a random variable w(E) for the random system E realizing the ideal 
blockcipher. 


If a distinguisher makes queries to a blockcipher E, let x 4 y denote the fact 
that it either made a query Eg (x) and received the encryption y or made a query 
Ez (y) and received the decryption x. An r-chain for keys (K1,..., Kr) is an 


(r + 1)-tuple (zo, K1,..., Kr) for which there exist x1,..., £r such that xo £. 


Tı st x, holds. Similarly, if a fixed permutation S is given and 1 <i <r, 


then an i-disconnected r-chain for keys (K1, .. . , K») with respect to S is an (r+1)- 
tuple (ao, K1,..., Kr) for which there exist 71,...,2, such that we have both 


1 w(E) was denoted as Keys” in W. 
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Ke-i4i Kr—i42 K, _ K K Kp—i 
ro >" gm S «+: + r; and SLri) = ripi 3 +--+. 5 ep. When de- 


scribing chains, we sometimes explicitly refer to the permutations instead of the 
keys that define them. For disconnected chains, we sometimes omit the reference 
to the permutation S$ if it is clear from the context. The purpose of the following 
definition will be clear from the proof of Theorem [N 


Definition 3. Let S be a fixed permutation. A distinguisher examines the key 
tuple (Ky, Ko,...,K,) w.r.t. S if it creates either an r-chain or an i-disconnected 
r-chain w.r.t. S for (Kı, K2,..., Kr) for any i € {1,...,r—1}. 


3 The Security of Cascade Encryption 


In this section we reestablish the lower bound on the security of triple encryption 
from [4] in a more general setting. Our goal here is to simplify the proof and 
make it more comprehensible thanks to the level of abstraction provided by the 
random systems framework. Using Lemma B] we also gain an improvement by 
a constant factor of 2 (cf. equation (10) in M). However, in order to fix the 
problem of the proof in Ø, a new factor l appears in the security bound. 

Although Theorem [J only explicitly states the security of cascades with odd 
length, we point out that a simple reduction argument proves that longer cas- 
cades cannot be less secure than shorter ones, except for a negligible term 1/2". 
Therefore, our result also implicitly proves any even cascade to be at least as 
secure as a one step shorter odd-length cascade. 

We also point out that our bound is only useful for cascades of reasonable 
length, for extremely long cascades (e.g. | ~ 2"/?) it becomes trivial. 


3.1 Proof of the Main Result 


Since this subsection aims to address the overall structure of the proof, we shall 
use two technical lemmas without proof (Lemmas JJ and E). These lemmas cor- 
respond to Lemmas 7 and 9 from f, which they improve and generalize. We 
shall prove them in later subsections. 

Let l > 3 be an odd integer. Let C1(-,-) denote a construction which expects 
two subsystems: a blockcipher EF and a permutation P. It chooses in advance | 
uniformly distinct keys K,,...,K,. These are not used by the system, their pur- 
pose is to make C;(-,-) comparable to the other constructions. C(-,-) provides 
an interface to make forward and backward queries both to the blockcipher E 
and to the permutation P. 

On the other hand, let C2(-) denote a construction which expects a blockci- 
pher E as the only subsystem. It chooses in advance l uniformly random keys 
Kı,..., Kı. It provides an interface to make forward and backward queries both 
to the blockcipher E and to a permutation P, which it realizes as Ex, 0---0 Eg. 
To achieve this, C2(-) queries its subsystem for all necessary values. Let C¥(-) 
be the same construction as C2(-) except that it chooses the keys Ky,..., Kı to 
be uniformly distinct. 
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Finally, let C3(-,-) denote a construction which again expects two subsys- 
tems: a blockcipher E and a permutation P. It chooses in advance l uniformly 
distinct keys Ky,..., Kı. It provides an interface to make forward and back- 
ward queries both to the blockcipher E and to the permutation P. However, 
answers to the blockcipher queries involving the key K; are modified to satisfy 
the equation Eg, o+- o Eg, = P. More precisely, forward queries are real- 
ized as Ex, (x) = P(ER,( . Ek, (x)---)) and backward queries are realized as 
Ex (y) = Er (Exı2( Ex (P! (y)) +- )). To achieve this, Cs(-,-) queries 
its subsystems for all necessary values. 

Recall that P and E denote the uniform random permutation and the ideal 
blockcipher, respectively. The following theorem bounds A,(C;(E, P), C2(E)), 
the advantage in distinguishing cascade encryption of length l from a random 
permutation, given access to the underlying blockcipher. 


Theorem 1. For the constructions Cy(-,-), Ca(-) and random systems E, P 
defined as above we have 


[1/2] l 2/3 12 
A, (C1 (E, P), C2(E)) < 210tl Tr +19 ( 4 ) + 


where a = max{2e2¥7” 2n + k|l1/2]}. 


Proof. First, it is easy to see that Aj(C2(E),C$(E)) < peou(2®, 1) < 1?/25+1 
and hence we have 4,(C1(E, P), C2(E)) < A,(Ci(E,P),C$(E)) + 1? /25+, 
However, note that C¢(E) = C3(E,P); this is because in both systems the 
permutations Ex,,...,E,,P are chosen randomly with the only restriction 
that Ex, o: -o Eg, = P is satisfied. Now we can use LemmafJto substitute the 
random permutation P in both C;(E,P) and C3(E,P) for a fixed one. Let S$ 
denote the permutation guaranteed by Lemma B] Then we have 


A,(C1(E, P), C3(E)) = A,(Ci(E, P), C3(B, P)) < A,(C1(E, S), C3(E, $)). 


Since the permutation S is fixed, it makes now no sense for the distinguisher to 
query this permutation; it can have the permutation S hardwired. 

From now on, we shall denote all queries to a blockcipher that involve one of 
the keys Kı, Ko, ..., Kı as relevant queries. Let us now consider a monotone 
condition A” (h € N is a parameter) defined on the random system C,(E, S). 
The condition Ah is satisfied if the keys (Kı, K2,..., Kı) were not examined 
w.r.t. S (in the sense of Definition B) by the first q queries and at most h of these q 
queries were relevant. Let B” be an analogous condition defined on C3(E, 9): B} 
is satisfied if the first q queries did not form a chain for the tuple (K1, Ko,..., Ki) 
and at most h of these queries were relevant. Let G and H denote the random 
systems C,(E, S) and C3(E, S) blocked by A” and B”, respectively. Then by 
Lemma PJ 


A,(Ci(E, S), C3(E, S)) < A,(G,H) + v(C1 (E, S), A}). 
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Let us first bound the quantity v(C,(E, S), At). We can write Ab as Uy ^ Ve 
where U, is satisfied if the first q queries did not examine the tuple of keys 
(Kı, Ko,..., Kı) and i is satisfied if at most h of the first q queries were 


relevant. Since Al < U; V V}, the union bound gives us 


v(C1(E, S), Ah) < v(C1 (E, S), U3) + v(Ci(E, 5), V$). 


We prove in Lemma Ø that v(C1(E, S), U;) < 2lal!/2Jq!/?1 /(2F)Ł. Since the 
keys K4,..., Kı do not affect the outputs of Cı (E, S), adaptivity does not help 
when trying to violate the condition vi therefore we can restrict our analysis to 
nonadaptive strategies for provoking Vi The probability that a given query is 
relevant is 1/2}, hence the expected number of relevant queries among the first q 
queries is 1qg/2* and by Markov’s inequality we have v(C1(E, 8), VE < 1q/h2". 
All put together, v(C1(E, S), AB) < 2lal!/2J gl/21 /(2¥)L + Iq/h2*. 

It remains to bound A,(G,H). These systems only differ in their behavior 
for the first h relevant queries, so let us make this difference explicit. Let G, 
be a random system that allows queries to l independent random permutations 
™,72,---,7, but returns L once the queries create an l-chain for any tuple in 
CS(m1, T2,..., 7). Let H, be a random system that allows queries to | random 
permutations 71, 72,...,7, such that mı 072 0...07; = id, but returns L once 
the queries create an l-chain for the tuple (71, 72,...,7). Let Ch,s(-) be a con- 
struction that allows queries to a blockcipher, let us denote it by Æ. In advance, 
it picks | random distinct keys K1,K2,...,Kı. Then it realizes the queries to 
Ex, ,Ex,,...,HK, aS T1, Ta,...,71-1 and mo S respectively, where the permuta- 
tions 7; for i € {1,...,1} are provided by a subsystem. Eg for all other keys K 
are realized by Cp,5(-) as random permutations. However, C;,,5(-) only redirects 
the first h relevant queries to the subsystem, after this number is exceeded, it 
responds to all queries by L. Intuitively, the subsystem used is responsible for 
the answers to the first h relevant queries (hence the subscript ”r”). Since the 
disconnected chains in C;,,5(G,) correspond exactly to the ordinary chains in 
G,, we have Ch ,s(G,) = G and Ch ,s(H,) = H. According to Lemma B] and 
Lemma [J below, we have 44(G, H) < Ap,(G,,H,) < h?/2”. 

Now we can optimize the choice of the constant h. The part of the advan- 
tage that depends on h is f(h) = lq/h2* + h?/2". This term is minimal for 


2/3 
h* = (lq2"7¥71)!/3 and we get f(h*) < 19 (=t) . This completes the 
proof. 


3.2 Examining the Relevant Keys 


Here we analyze the probability that the adversary examines the relevant keys 
(Kı,..., Kı) w.r.t. S during its interaction with the random system C,(E, S). 
This is a generalization of Lemma 7 from [A] to longer cascades, also taking 
disconnected chains into account. 
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Lemma 4. Let the random system Cı (E, S) and the condition U; be defined as 
in the proof of Theorem[ with the number of keys | being odd. Then we have 
v(Ci(E, S), Ug) < 2lal!/2] gl'/21 /(2F)L, where a = max{2e2*—", 2n + k|1/2]}. 


Proof. Recall that the relevant keys Ky,..., Kı are examined by the distin- 
guisher if it creates either an /-chain or an 7-disconnected l-chain for the tuple 
(Kı, Ko,..., Kı) for any LE {1,...,l- 1}. 

Let i € {1,...,1—1} be fixed. We first bound the probability that the distin- 
guisher creates an 7-disconnected [-chain. Since the relevant keys do not affect 
the behavior of the system C,(E, S), this probability is equal to the number of 
l-tuples of distinct keys for which an i-disconnected /-chain was created, divided 
by the number of all /-tuples of distinct keys, which is (2*)4 The numerator can 
be upper bounded by the number of all i-disconnected [-chains that were created 
(here we also count those created for non-distinct key tuples). Hence, let Chë be 
denote the maximum number of 7-disconnected [-chains any distinguisher can 
create by issuing q queries to a fixed blockcipher F and let Chg denote the 
expected value of Chř i,q With respect to the choice of E by E. 

Let G be a directed graph corresponding to a blockcipher E, as described in 
Subsection 2.3] Let H be the spanning subgraph of G containing only the edges 
that were queried by the distinguisher. Any i-disconnected /-chain consists of 
l edges in H, let us denote them as e€1,€2,..., e following the order in which 
they appear in the chain. Then for each of the odd edges e1,e3,...,e there 
are q possibilities to choose which of the queries corresponds to this edge. Once 
the odd edges are fixed, they uniquely determine the vertices £o, 71,..., 2, such 
that ej is Tj-1 ` Tj for JE {1,3, Jox al \ {i+ 1} and Cj41 is S-1(a,) — Ti+i 
if į is even. Since there are at most w(F) possible edges to connect any pair of 
vertices in G, there are now at most w(/) possibilities to choose each of the even 
edges €2,€4,...,¢€/-1 so that e; is zj-1 > a, for j € {2,4,...,J-1}\{i+1} 
and an is S~1(x;) > zxi+ı if i is odd. Hence, Chi a < w(E)l/2)ql/?1 and 
ChE, < w(E) M2 gA, l 

It remains to bound the value w(E). For this, we use the bound from f, 
where the inequality P[w(E) > 6] < 2?”*1~8 is proved for any 8 > 2e2*-". 
Using this inequality gives us 

ChE, < E[Ch 


ve ig | OE ) < a] +E|Ch?,, | w(E) > a]. 227th 


< qult/2] gl¥/21 4 gkli/2] glt/292n+1-2 < goll/2l glt/21, 


where the last two inequalities hold since w(E) < 2° and a > 2n + k|l1/2] > 2. 

Putting all together, we get that the probability of forming an 7-disconnected l- 
chain for the keys (K1, K2,..., K1) can be upper bounded by 2al!/?! g!'/21 /(2*)4, 
Since this holds for each i € {1,2,...,/—1} and the probability of creating an 
i-chain for the keys (K1, ..., Kı) can be bounded in the same way, by the union 
bound we get v(Ci(E, s), U. Ug) < 2lal!/2 g2 / (25). 
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3.3 Distinguishing Independent and Correlated Permutations 


Now we shall improve the bound on A;,(G,,H;) stated by Lemma 9 in Ø. 
Using the concept of conditional equivalence from [IQ], our result is better by a 
constant factor and is applicable for the general case of [-cascade encryption. 
Recall that G, is a random system that provides an interface to query l 
random independent permutations] 7,...,7 in both directions. However, if 
the queries of the distinguisher form an l-chain for any tuple of permutations 
in CS(m1,...,77), the system G, becomes blocked and answers all subsequent 
queries (including the one that formed the chain) with the symbol L. On the 
other hand, H, is a random system that provides an interface to query l random 


permutations 7,...,7, such that mı 0--- om, = id, again in both directions. 
Similarly, if an /-chain is created for any tuple in CS(m,..., 7) (which is in this 
case equivalent to creating an [-chain for (71,...,77)), H, answers all subse- 


quent queries with the symbol L. Therefore, the value A;,(G,,H,) denotes the 
best possible advantage in distinguishing / independent random permutations 
from l random permutations correlated in the described way, without forming 
an [-chain. 


Lemma 5. Let G, and H, be the random systems defined in the proof of The- 
orem} Then Ap(G,,H,) < h?/2”. 


Proof. First, let us introduce some notation. In any experiment where the per- 
mutations 7,...,7; are queried, let dom,(7;) denote the set of all x € {0,1}” 
such that among the first j queries, the query m(x) was already answered or 
some query m; (y) was answered by x. Similarly, let range,;(7;) be the set of all 
y € {0,1}” such that among the first j queries, the query 7; '(y) was already 
answered or some query 7;(x) was answered by y. In other words, dom,;(7;) and 
range;(m:) denote the domain and range of the partial function m; defined by 
the first 7 answers. For each pair of consecutive permutationg4 Ti and Ti+1, let 
xo) denote the set {0,1}” \ (range; (mi) Udom; (mi+1)) of fresh, unused values. If 
a “4 y then we call the queries m;(x) and 7; '(y) trivial and the queries 7+1(y) 
and 7;_',(a) are said to extend a chain if they are not trivial too. 

Now we introduce an intermediate random system S and show how both G, 
and H, are conditionally equivalent to S. This allows us to use Lemma [I] to 
bound the advantage in distinguishing G, and H,. The system S also provides 
an interface to query l permutations 7,..., 77. It works as follows: it answers any 
non-trivial forward query 7;(x) with a value chosen uniformly from the set xo ae 
and any non-trivial backward query a; (x) with a value chosen uniformly from 
the set x” (assuming it is the j** query). Any trivial queries are answered 
consistently with previous answers. Moreover, if the queries form an [-chain for 
any tuple in CS(m,...,7), S also gets blocked and responds with L to any 
further queries. Note that S is only defined as long as xë =4)] > 0, but if this 
is not true, we have h > 2” and the lemma holds trivially. 


? All permutations considered here are defined on the set {0, 1}”. 
3 The indexing of permutations is cyclic, e.g. 7/41 denotes the permutation 71. 
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Let us now consider the jt” query that does not extend an (J — 1)-chain 
(otherwise both G, and S get blocked). Then the system G, answers any non- 
trivial forward query 7;(a) by a random element uniformly chosen from {0,1}"\ 
range, _1 (Ti) or gets blocked if this answer would create an l-chain by connecting 
two shorter chains. On the other hand, the system S answers with a random el- 
ement uniformly chosen from xi ~) which is a subset of {0, 1}” \ range;—1 (7i). 
The situation for backward queries is analogous. Therefore, let us define a mono- 
tone condition K on G,: the event K; is satisfied if Kj—ı was satisfied and the 
answer to the jt? query was picked from the set xl =) 


forward query 7; (x) or from the set #77) 


if it was a non-trivial 
if it was a non-trivial backward query 
m7 (y). Note that as long as Ķ is satisfied, no l-chain can emerge by connecting 
two shorter chains. By the previous observations and the definition of K, we have 
G,|K = S which by Lemma [implies A;,(G,,S) < v(G,, Kn). The probability 
that K is violated by the jt? answer is 


|\domj—1(7i+1) \ range;_1(7)| z |{0,1}” \ XG-Y) 2 j—1 


[{0,1}"\range;_i(mi)| 7 HOI a? 
which gives us v(G,, Kn) < El = 1)" <a, 
In the system H,, the permutations 7,...,7; can be seen as 2” cycles of 


length l, each of which is formed by the edges connecting the vertices x, 71(x),..., 
m1(-++71(a)---),@ for some x € {0,1}” and labeled by the respective permu- 
tations. We shall call such a cycle used if at least one of its edges was queried 
in either directior, otherwise we call it unused. Let us now define a monotone 
condition £ on H,: the event L; is satisfied if during the first j queries, any non- 
trivial query which did not extend an existing chain queried an unused cycle. 
We claim that H,|£ = S. To see this, let us consider all possible types of 
queries. If the jt® query 7;(z) is trivial or it extends an (l — 1)-chain, both sys- 
tems behave identically. Otherwise, the system H, answers with a value y, where 
y ¢ range,;_j(m) (because 7; is a permutation) and y ¢ domj_1(7i+1), since that 
would mean that £ was violated either earlier (if this query extends an existing 


1) 


chain) or now (if it starts a new chain). All values from xl ~~’ have the same 


probability of being y, because for any y1, y2 € xO D there exists a straightfor- 
ward bijective mapping between the arrangement of the cycles consistent with 
mi(x) = yı or m;(x) = y2 (and all previous answers). Therefore, H, answers with 
an uniformly chosen element from ae =) and so does S. For backward queries, 
the situation is analogous. By Lemmaf[JJ this gives us A;,(S,H,) < v(H,, Ln). 
Let the jt? query be a non-trivial forward query 7;(x) that does not extend a 
chain, i.e., © € x7”. Let u denote the number of elements in xT” that are 
in a used cycle on the position between m;—1 and 7;. Then since every element 
in ae Pa, has the same probability of having this property (for the same reason 


as above), this query violates the condition £ with probability u/ |x F7?| < 


4 We consider a separate edge connecting two vertices for each cycle in which they 
follow each other, hence each query creates at most one used cycle. 
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(u + |range,_4(m:-1) U domj_1(7)|)/2” < (j — 1)/2”. Hence v(H,, Ln) < D 
(j — 1)/2" < hè J2, 

Putting everything together, we have A;,(G,,H,) < Aa (Gr, S)+ 4a (S, H,) < 
h? /2”, which completes the proof. 


4 Conclusions 


In this paper, we have studied the security of the cascade encryption. The most 
important recent result on this topic [Æ contained a few mistakes, which we 
pointed out and corrected. We have formulated the proof from [MJ in the random 
systems framework, which allows us to describe it on a more abstract level and 
thus in a more compact argument. This abstraction leads to a minor improve- 
ment for the case of triple encryption, as well as a generalization for the case of 
longer cascades. We prove that for the wide class of blockciphers with smaller 
key space than message space, a reasonable increase in the length of the cascade 
improves the encryption security. Our intention here was also to demonstrate 
the power of the random systems framework as a tool for modelling the behav- 
ior and interactions of discrete systems, with a focus towards analyzing their 
indistinguishability. 
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A Problems with the Proof in [4] 


The proof of a lower bound for the security of triple encryption presented in K 
contains some errors. We describe briefly where these errors come from, assuming 
the reader is familiar with the terminology and the proof from [4]. We shall be re- 
ferring to the version 2.3 of the paper published at the online ePrint archive. The 
proof eventually comes down to bounding the advantage in distinguishing inde- 
pendent random permutations 79,7 ,72 from random permutations 70,71, 72 
such that To o 71 © m2 = id (distinguishing games G and H). This can be done 
easily if the distinguisher is allowed to extend a 2-chain by his queries, therefore 
the adversary is not allowed to do that in games G and H. To justify this, before 
proceeding to this part of the proof, the authors have to argue in a more complex 
setting (games Ds and Ra) that the probability of extending a 2-chain for the 
relevant keys is negligible. However, due to the construction of the adversary 
Bs,» from the adversary B, extending a 2-chain by Bs,» in the experiment H Bs.» 
does not correspond to extending a 2-chain by B in DB, but to something we 
call a disconnected chain. The same can be said about the experiments Re and 
GPs. Therefore, by bounding the probability of extending a 2-chain for the 
relevant keys in the experiment R?, the authors do not bound the probability 
of extending a 2-chain in the experiment GPs», which they later need. 

The second problem of the proof in [A lies in bounding the probability of cre- 
ating a chain using the game L. This is done by the equation P[R? sets x2ch] < 
3-2-*+P[B® sets bad] on page 19, which is also invalid. To see this, note that the 
game L only considers chains using subsequently the keys (Ko, Kı, K2), while 
the flag x2ch in the experiment Re can also be set by a chain for any cyclic 
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shift of this triple, e.g. (K2, Ko, Kı). This is why a new multiplicative factor l 
appears in the security bound we have proved. 

In the version 3.0 of the paper [A], the second bug mentioned here was fixed, 
while the first is still present in a different form. Now the games G and Hs 
can be easily distinguished by forming a disconnected chain, for example by the 
following trivial adversary B: 


Adversary B 
x, © {0,1}"; 
z2 + H(1, £1); 23 — H(2, £2); £o — S~*(a3); x1 — (0, zo); 
if zı = x) return 1 else return 0; 


This problem can be fixed by introducing the concept of disconnected chains 
and bounding the probability of them being constructed by the adversary, as we 
do for the general case of l-cascades in Lemma] 


Quantum-Secure Coin-Flipping and Applications 


Ivan Damgard and Carolin Lunemann 


DAIMI, Aarhus University, Denmark 


{ivan,carolin}@cs.au.dk 


Abstract. In this paper, we prove classical coin-flipping secure in the 
presence of quantum adversaries. The proof uses a recent result of Wa- 
trous that allows quantum rewinding for protocols of a certain form. 
We then discuss two applications. First, the combination of coin-flipping 
with any non-interactive zero-knowledge protocol leads to an easy trans- 
formation from non-interactive zero-knowledge to interactive quantum 
zero-knowledge. Second, we discuss how our protocol can be applied to 
a recently proposed method for improving the security of quantum pro- 
tocols H], resulting in an implementation without set-up assumptions. 
Finally, we sketch how to achieve efficient simulation for an extended 
construction in the common-reference-string model. 


Keywords. quantum cryptography, coin-flipping, common reference 
string, quantum zero-knowledge. 


1 Introduction 


In this paper, we are interested in a standard coin-flipping protocol with classical 
messages exchange but where the adversary is assumed to be capable of quantum 
computing. Secure coin-flipping allows two parties Alice and Bob to agree on a 
uniformly random bit in a fair way, i.e., neither party can influence the value of 
the coin to his advantage. The (well-known) protocol proceeds as follows: Alice 
commits to a bit a, Bob then sends bit b, Alice opens the commitment and the 
resulting coin is the exclusive disjunction of both bits, i.e. coin = a b. 

For Alice’s commitment to her first message, we assume a classical bit com- 
mitment scheme. Intuitively, a commitment scheme allows a player to commit 
to a value, while keeping it hidden (hiding property) but preserving the pos- 
sibility to later reveal the value fixed at commitment time (binding property). 
More formally, a bit commitment scheme takes a bit and some randomness as 
input. The hiding property is formalized by the non-existence of a distinguisher 
able to distinguish with non-negligible advantage between a commitment to 0 
and a commitment to 1. The binding property is fulfilled, if it is infeasible for a 
forger to open one commitment to both values 0 and 1. The hiding respectively 
binding property holds with unconditional (i.e. perfect or statistical) security 
in the classical and the quantum setting, if the distinguisher respectively the 
forger is unrestricted with respect to his (quantum-) computational power. In 
case of a polynomial-time bounded classical distinguisher respectively forger, the 
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commitment is computationally hiding respectively binding. The computation- 
ally hiding property translates to the quantum world by simply allowing the 
distinguisher to be quantum. However, the case of a quantum forger can not be 
handled in such a straightforward manner, due to the difficulties of rewinding in 
general quantum systems (see e.g. for discussions). 

For our basic coin-flip protocol, we assume the commitment to be uncon- 
ditionally binding and computationally hiding against a quantum adversary[] 
Thus, we achieve unconditional security against cheating Alice and quantum- 
computational security against dishonest Bob. Such a commitment scheme 
follows, for instance, from any pseudorandom generator [I5], secure against a 
quantum distinguisher. Even though the underlying computational assumption, 
on which the security of the embedded commitment is based, withstands quan- 
tum attacks, the security proof of the entire protocol and its integration into 
other applications could previously not be naturally translated from the clas- 
sical to the quantum world. Typically, security against a classical adversary is 
argued using rewinding of the adversary. But in general, rewinding as a proof 
technique cannot be directly applied, if Bob runs a quantum computer: First, 
the intermediate state of a quantum system cannot be copied PBI], and second, 
quantum measurements are in general irreversible. Hence, in order to produce a 
classical output, the simulator had to (partially) measure the quantum system 
without copying it beforehand, but then it would become generally impossible 
to reconstruct all information necessary for correct rewinding. For these rea- 
sons, no simple and straightforward security proofs for the quantum case were 
previously known. 

In this paper, we show the most natural and direct quantum analogue of the 
classical security proof for standard coin-flipping, by using a recent result of Wa- 
trous 20]. Watrous showed how to construct an efficient quantum simulator for 
quantum verifiers for several zero-knowledge proof systems such as graph isomor- 
phism, where the simulation relies on the newly introduced quantum rewinding 
theorem. We now show that his quantum rewinding argument can also be applied 
to classical coin-flipping in a quantum world. 

By calling the coin-flip functionality sequentially a sufficient number of times, 
the communicating parties can interactively generate a common random string 
from scratch. The generation can then be integrated into other (classical or quan- 
tum) cryptographic protocols that work in the common-reference-string model. 
This way, several interesting applications can be implemented entirely in a simple 
manner without any set-up assumptions. Two example applications are discussed 
in the second part of the paper. 

The first application relates to zero-knowledge proof systems, an important 
building block for larger cryptographic protocols. Recently, Hallgren et al. [3] 
showed that any honest verifier zero-knowledge protocol can be made zero- 
knowledge against any classical and quantum verifier. Here we show a related 


1 Recall that unconditionally secure commitments, i.e. unconditionally hiding and 
binding at the same time, are impossible in both the classical and the quantum 
world. 
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result, namely, a simple transformation from non-interactive (quantum) 
zero-knowledge to interactive quantum zero-knowledge. A non-interactive zero- 
knowledge proof system can be trivially turned into an interactive honest veri- 
fier zero-knowledge proof system by just letting the verifier choose the reference 
string. Therefore, this consequence of our result also follows from [I3]. However, 
our proof is much simpler. In general, the difference between us and [I3] is that 
our focus is on establishing coin-flipping as a stand-alone tool that can be used in 
several contexts rather than being integrated in a zero-knowledge construction 
as in [3]. 

As second application we discuss the interactive generation of a common ref- 
erence string for the general compiler construction improving the security of a 
large class of quantum protocols that was recently proposed in ff]. Applying the 
compiler, it has been shown how to achieve hybrid security in existing protocols 
for password-based identification [6] and oblivious transfer [I] without significant 
efficiency loss, such that an adversary must have both large quantum memory 
and large computing power to break the protocol. Here we show how a common 
reference string for the compiler can be generated from scratch according to the 
specific protocol requirements in W. 

Finally, we sketch an extended commitment scheme for quantum-secure coin- 
flipping in the common-reference-string model. This construction can be effi- 
ciently simulated without the need of rewinding, which is necessary to claim 
universal composability. 


2 Preliminaries 


2.1 Notation 


We assume the reader’s familiarity with basic notation and concepts of quantum 
information processing as in standard literature, e.g. [I6]. Furthermore, we will 
only give the details of the discussed applications that are most important in 
the context of this work. A full description of the applications can be found in 
the referenced papers. 

We denote by negl(n) any function of n, if for any polynomial p it holds that 
negl(n) < 1/p(n) for large enough n. As a measure of closeness of two quantum 
states p and a, their trace distance ô(p, o) = $ tr(|p—a]) or square-fidelity (p|c|p) 
can be applied. A quantum algorithm consists of a family {Cn nen of quantum 
circuits and is said to run in polynomial time, if the number of gates of Cn is 
polynomial in n. Two families of quantum states {pn}nen and {on}nen are called 
quantum-computationally indistinguishable, denoted p Xo, if any polynomial- 
time quantum algorithm has negligible advantage in n of distinguishing pn from 
on. Analogously, they are statistically indistinguishable, denoted p ~ ø, if their 
trace distance is negligible in n. For the reverse circuit of quantum circuit Q, we 
use the standard notation for the transposed, complex conjugate operation, i.e. 
Qi. The controlled-NOT operation (CNOT) with a control and a target qubit 
as input flips the target qubit, if the control qubit is 1. In other words, the value 
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of the second qubit corresponds to the classical exclusive disjunction (XOR). A 
phase-flip operation can be described by Pauli operator Z. For quantum state p 
stored in register R we write |p) p- 


2.2 Definition of Security 


We follow the framework for defining security which was introduced in [8| and 
also used in [4]. Our cryptographic two-party protocols run between player Al- 
ice, denoted by A, and player Bob (B). Dishonest parties are indicated by A* 
and B*, respectively. The security against a dishonest player is based on the 
real/ideal-world paradigm that assumes two different worlds: The real-world that 
models the actual protocol JT and the ideal-world based on the ideal function- 
ality F that describes the intended behavior of the protocol. If both executions 
are indistinguishable, security of the protocol in real life follows. In other words, 
a dishonest real-world player P* that attacks the protocol cannot achieve (sig- 
nificantly) more than an ideal-world adversary p* attacking the corresponding 
ideal functionality. 

More formally, the joint input state consists of classical inputs of honest 
parties and possibly quantum input of dishonest players. A protocol IT con- 
sists of an infinite family of interactive (quantum) circuits for parties A and 
B. A classical (non-reactive) ideal functionality F is given by a conditional 
probability distribution Prin 4 ing)|inaing» inducing a pair of random variables 
(out,,outp) = F(ina,ing) for every joint distribution of ina and ing, where 
inp and outp denote party P’s in- and output, respectively. For the definition 
of (quantum-) computational security against a dishonest Bob, a polynomial- 
size (quantum) input sampler is considered, which produces the input state of 
the parties. 


Definition 2.1 (Correctness). A protocol IT correctly implements an ideal 
classical functionality F, if for every distribution of the input values of hon- 
est Alice and Bob, the resulting common outputs of I and F are statistically 
indistinguishable. 


Definition 2.2 (Unconditional security against dishonest Alice). A pro- 
tocol IT implements an ideal classical functionality F unconditionally securely 
against dishonest Alice, if for any real-world adversary A*, there exists an ideal- 
world adversary A‘, such that for any input state it holds that the output state, 
generated by A* through interaction with honest B in the real-world, is statisti- 
cally indistinguishable from the output state, generated by A* through interaction 
with F and A* in the ideal-world. 


Definition 2.3 ((Quantum-) Computational security against dishonest 
Bob). A protocol IT implements an ideal classical functionality F (quantum-) 
computationally securely against dishonest Bob, if for any (quantum-) computa- 
tionally bounded real-world adversary B*, there exists a (quantum-) computation- 
ally bounded ideal-world adversary B*, such that for any efficient input sampler, 
it holds that the output state, generated by B* through interaction with honest A 
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in the real-world, is (quantum-) computationally indistinguishable from the out- 
put state, generated by B* through interaction with F and B* in the ideal-world. 


For more details and a definition of indistinguishability of quantum states, 
see [8]. There, it has also been shown that protocols satisfying the above defini- 
tions compose sequentially in a classical environment. Furthermore, note that in 
Definition we do not necessarily require the ideal-world adversary A* to be 
efficient. We show in SectionJhow to extend our coin-flipping construction such 
that we can achieve an efficient simulator. 

The coin-flipping scheme in Section Blas well as the example applications in 
Sections ¥.I]and ZZ] work in the common-reference-string (CRS) model. In this 
model, all participants in the real-world protocol have access to a classical public 
CRS, which is chosen before any interaction starts, according to a distribution 
only depending on the security parameter. However, the participants in the ideal- 
world interacting with the ideal functionality do not make use of the CRS. Hence, 
an ideal-world simulator P* that operates by simulating a real-world adversary 
P* is free to choose a string in any way he wishes. 


3 Quantum-Secure Coin-Flipping 


3.1 The Coin-Flip Protocol 


Let n indicate the security parameter of the commitment scheme which underlies 
the protocol. We use an unconditionally binding and quantum-computationally 
hiding commitment scheme that takes a bit and some randomness r of length 
l as input, ie. com : {0,1} x {0,1}! — {0,1}'41. The unconditionally binding 
property is fulfilled, if it is impossible for any forger to open one commitment to 
both 0 and 1, i.e. to compute r,r’ such that com(0,r) = com(1,r’). Quantum- 
computationally hiding is ensured, if no quantum distinguisher can distinguish 
between com(0,r) and com(1,r’) for random r,r’ with non-negligible advantage. 
As mentioned earlier, for a specific instantiation we can use, for instance, Naor’s 
commitment based on a pseudorandom generator [I5]. This scheme does not 
require any initially shared secret information and is secure against a quantum 
distinguisher H 

We let Alice and Bob run the Coin — Flip Protocol (see Fig. [), which inter- 
actively generates a random and fair coin in one execution and does not require 
any set-up assumptions. Correctness is obvious by inspection of the protocol: If 
both players are honest, they independently choose random bits. These bits are 
then combined via exclusive disjunction, resulting in a uniformly random coin. 

The corresponding ideal coin-flip functionality Fco is described in Figure B] 
Note that dishonest A* may refuse to open com(a,r) in the real-world after 
learning B’s input. For this case, Fcon allows her a second input REFUSE, leading 
to output FAIL and modeling the abort of the protocol. 


2 We describe the commitment scheme in this simple notation. However, if it is based 
on a specific scheme, e.g. [I5], the precise notation has to be slightly adapted. 
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Coin — Flip Protocol 


1. A chooses a €r {0,1} and computes com(a,r). She sends com(a,r) to B. 
2. B chooses b Er {0,1} and sends b to A. 


3. A sends open(a,r) and B checks if the opening is valid. 
4. Both compute coin = a È b. 


Fig. 1. The Coin-Flip Protocol 


Ideal Functionality Fcon: 


Upon receiving requests start from Alice and Bob, Fcoin outputs a uniformly 
random coin to Alice. It then waits to receive Alice’s second input ok or REFUSE 
and outputs coin or FAIL to Bob, respectively. 


Fig. 2. The Ideal Coin-Flip Functionality 


3.2 Security 


Theorem 3.1. The Coin — Flip Protocol is unconditionally secure against 
any unbounded dishonest Alice according to Definition ZA provided that the 
underlying commitment scheme is unconditionally binding. 


Proof. We construct an ideal-world adversary AY, such that the real output of 
the protocol is statistically indistinguishable from the ideal output produced by 
At, Fcon and A*. 

First note that a,r and com(a,r) are chosen and computed as in the real 
protocol. From the statistically binding property of the commitment scheme, it 
follows that A*’s choice bit a is uniquely determined from com(a, r), since for any 
com, there exists at most one pair (a,r) such that com = com(a,r) (except with 
probability negligible in n). Hence in the real-world, A* is unconditionally bound 
to her bit before she learns B’s choice bit, which means a is independent of b. 
Therefore in Step P] the simulator can correctly (but not necessarily efficiently) 
compute a (and r). Note that, in the case of unconditional security, we do not 
have to require the simulation to be efficient. We show in Section þf how to 
extend the commitment in order to extract A*’s inputs efficiently. Finally, due 
to the properties of XOR, A* cannot tell the difference between the random b 
computed (from the ideal, random coin) in the simulation in Step B] and the 
randomly chosen b of the real-world. It follows that the simulated output is 
statistically indistinguishable from the output in the real protocol. 


To prove security against any dishonest quantum-computationally bounded B*, 
we show that there exists an ideal-world simulation B* with output quantum- 
computationally indistinguishable from the output of the protocol in the 
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Ideal — World Simulation A*: 


1. Upon receiving com(a,r) from A*, A* sends start and then ok to Fcon as first 
and second input, respectively, and receives a uniformly random coin. 


. A* computes a and r from com(a,r). 
. A* computes b = coin @ a and sends b to A*. 
. A® waits to receive A*’s last message and outputs whatever A* outputs. 


Fig. 3. The Ideal-World Simulation x 


real-world. In a classical simulation, where we can simply use rewinding, a 
polynomial-time simulator works as follows. It inquires coin from Fcon, chooses 
random a and r, and computes b’ = coin @ a as well as com(a,r). It then sends 
com(a,r) to B* and receives B*’s choice bit b. If b = 0’, the simulation was suc- 
cessful. Otherwise, the simulator rewinds B* and repeats the simulation. Note 
that our security proof should hold also against any quantum adversary. The 
polynomial-time quantum simulator proceeds similarly to its classical analogue 
but requires quantum registers as work space and relies on the quantum rewind- 
ing lemma of Watrous (see Lemma [in Appendix (A). 

In the paper, Watrous proves how to construct a quantum zero-knowledge 
proof system for graph isomorphism using his (ideal) quantum rewinding lemma. 
The protocol proceeds as a X-protocol, i.e. a protocol in three-move form, where 
the verifier flips a single coin in the second step and sends this challenge to the 
prover. Since these are the essential aspects also in our Coin — Flip Protocol, 
we can apply Watrous’ quantum rewinding technique (with slight modifications) 
as a black-box to our protocol. We also follow his notation and line of argument 
here. For a more detailed description and proofs, we refer to BO]. 


Theorem 3.2. The Coin — Flip Protocol is quantum-computationally secure 
against any polynomial-time bounded, dishonest Bob according to Definition ZA 
provided that the underlying commitment scheme is quantum-computationally 
hiding and the success probability of quantum rewinding achieves a non-negligible 
lower bound po. 


Proof. Let W denote B*’s auxiliary input register, containing an n-qubit state 
|). Furthermore, let V and B denote B*’s work space, where V is an arbitrary 
polynomial-size register and B is a single qubit register. A’s classical messages 
are considered in the following as being stored in quantum registers A; and 
Ag. In addition, the quantum simulator uses registers R, containing all possible 
choices of a classical simulator, and G, representing its guess b’ on B*’s message 
b in the second step. Finally, let X denote a working register of size k, which is 
initialized to the state |0*) and corresponds to the collection of all registers as 
described above except W. 
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The quantum rewinding procedure is implemented by a general quantum cir- 
cuit Reoin with input (W, X, B*, coin). As a first step, it applies a unitary (ñ, k)- 
quantum circuit Q to (W, X) to simulate the conversation, obtaining registers 
(G,Y). Then, a test takes place to observe whether the simulation was suc- 
cessful. In that case, Reoin outputs the resulting quantum register. Otherwise, 
it quantumly rewinds by applying the reverse circuit QT on (G,Y) to retrieve 
(W, X) and then a phase-flip transformation on X before another iteration of Q 
is applied. Note that R.oin is essentially the same circuit as R described in BO, 
but in our application it depends on the value of a given coin, i.e., we apply 
Ro or R; for coin = 0 or coin = 1, respectively. In more detail, Q transforms 
(W, X) to (G,Y) by the following unitary operations: 


(1) It first constructs the superposition 


1 ; Et 
SET Ds la") nleom(a,r)) 4,10= coin ® a)glapen(a,r)) 4,10), |0" ), lw: 


where $’ < k. Note that the state of registers (Ai, G, Az) corresponds to a 
uniform distribution of possible transcripts of the interaction between the 
players. 

(2) For each possible com(a,r), it then simulates B*’s possible actions by apply- 
ing a unitary operator to (W, V, B, Ai) with A, as control: 


saz È lar) gleom(a,r) ,[Brcloven(a.r)) alP)s]8),, [Py 


where db and w describe modified quantum states. 

(3) Finally, a CNOT-operation is applied to pair (B,G) with B as control to 
check whether the simulator’s guess of B*’s choice was correct. The result of 
the CNOT-operation is stored in register G. 


= 5 |a, r) plcom(a,r)) 4, |b" ® b) glopen(a, P) aalb) alo) |) . 


If we denote with Y the register that contains the residual n+ k —1 -qubit state, 
the transformation from (W, X) to (G,Y) by applying Q can be written as 


Q (l¥)w]0") .) = VBIO)closooa('))y + VT=Pll)aleoaal))y > 


where 0 < p < 1 and |@gooa(w)) denotes the state, we want the system to be 
in for a successful simulation. Reoin then measures the qubit in register G with 
respect to the standard basis, which indicates success or failure of the simulation. 
A successful execution (where b = 6’) results in outcome 0 with probability p. In 
that case, Reoin outputs Y. A measurement outcome 1 indicates b 4 b’, in which 
case Reoin Quantumly rewinds the system, applies a phase-flip (on register X) 
and repeats the simulation, i.e. 
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a(2(1@ |o¥) (0%) -1)Q'. 


Watrous’ ideal quantum rewinding lemma (without perturbations) then states 
the following: Under the condition that the probability p of a successful sim- 
ulation is non-negligible and independent of any auxiliary input, the output 
p(w) of R has square-fidelity close to 1 with state |@gooa(W)) of a successful 
simulation, i.e., 


(Pgooa(P)|0(Y)|Pgooa()) >1—e 


with error bound 0 < € < 4. Note that for the special case where p equals 1/2 
and is independent of |y), the simulation terminates after at most one rewinding. 

However, we cannot apply the exact version of Watrous’ rewinding lemma in our 
simulation, since the commitment scheme in the protocol is only (quantum-) com- 
putationally hiding. Instead, we must allow for small perturbations in the quan- 
tum rewinding procedure as follows. Let adv denote B*’s advantage over arandom 
guess on the committed value due to his computing power, i.e. adv = |p — 1/2]. 
From the hiding property, it follows that adv is negligible in the security param- 
eter n. Thus, we can argue that the success probability p is close to independent 
of the auxiliary input and Watrous’ quantum rewinding lemma with small pertur- 
bations, as stated in the appendix (Lemmaf[]), applies with q = 4 and € = adv. 
All operations in Q can be performed by polynomial-size circuits, and thus, the 
simulator has polynomial size (in the worst case). Furthermore, for negligible € 
but non-negligible lower bound po on the success probability p, it follows that the 
“closeness” of output p(w) with good state |¢gooa(z)) is slightly reduced but quan- 
tum rewinding remains possible. 

Finally, to proof security against quantum B*, we construct an ideal-world 
quantum simulator B* (see Fig. Ø, interacting with B* and the ideal func- 
tionality Fco and executing Watrous’ quantum rewinding algorithm. We then 
compare the output states of the real process and the ideal process. In case of 
indistinguishable outputs, quantum-computational security against B* follows. 


Ideal — World Simulation B*: 


. Be gets B*’s auxiliary quantum input W and working registers X. 
2. B* sends start and then ok to Fcon. It receives a uniformly random coin. 
. Depending on the value of coin, B* applies the corresponding circuit Reoin 


with input W, X, B* and coin. 

. B* receives output register Y with |gooa(¥)) and “measures the conversation” 
to retrieve the corresponding (com(a,r), b, open(a,r)). It outputs whatever B* 
outputs. 


Fig. 4. The Ideal-World Simulation B* 
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First note that the superposition constructed as described above in circuit Q 
as Step (1) corresponds to all possible random choices of values in the real pro- 
tocol. Furthermore, the circuit models any possible strategy of quantum B* in 
Step (2), depending on control register |com(a,r)) 4,- The CNOT-operation on 
(B,G) in Step (3), followed by a standard measurement of G, indicate whether 
the guess b’ on B*’s choice b was correct. If that was not the case (i.e. b 4 b 
and measurement result 1), the system gets quantumly rewound by applying re- 
verse transformations (3)-(1), followed by a phase-flip operation. The procedure 
is repeated until the measurement outcome is 0 and hence b = b’. Watrous’ tech- 
nique then guarantees that, assuming negligible € and non-negligible po, then €’ 
is negligible and thus, the final output p(w) of the simulation is close to good 
state |@gooa(w)). It follows that the output of the ideal simulation is indistin- 
guishable from the output in the real-world for any quantum-computationally 
bounded B*. 


4 Applications 


4.1 Interactive Quantum Zero-Knowledge 


Zero-knowledge proofs are an important building block for larger cryptographic 
protocols. The notion of (interactive) zero-knowledge (ZK) was introduced by 
Goldwasser et al. [I]. Informally, ZK proofs for any NP language L yield no 
other knowledge to the verifier than the validity of the assertion proved, i.e. 
x € L. Thus, only this one bit of knowledge is communicated from prover to 
verifier and zero additional knowledge. For a survey about zero-knowledge, see 
for instance DO. 

Blum et al. R| showed that the interaction between prover and verifier in any 
ZK proof can be replaced by sharing a short, random common reference string 
according to some distribution and available to all parties from the start of the 
protocol. Note that a CRS is a weaker requirement than interaction. Since all 
information is communicated mono-directional from prover to verifier, we do not 
have to require any restriction on the verifier. 

As in the classical case, where ZK protocols exist if one-way functions exist, 
quantum zero-knowledge (QZK) is possible under the assumption that quantum 
one-way functions exist. In [[4], Kobayashi showed that a common reference 
string or shared entanglement is necessary for non-interactive quantum zero- 
knowledge. Interactive quantum zero-knowledge protocols in restricted settings 
were proposed by Watrous in the honest verifier setting and by Damgard et 
al. in the CRS model [5J, where the latter introduced the first X-protocols for 
QZK withstanding even active quantum attacks. In 20], Watrous then proved 
that several interactive protocols are zero-knowledge against general quantum 
attacks. 

Recently, Hallgren et al. [[3] showed how to transform a X-protocol with 
stage-by-stage honest verifier zero-knowledge into a new 5/-protocol that is zero- 
knowledge against all classical and quantum verifiers. They propose special bit 
commitment schemes to limit the number of rounds, and view each round as a 
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IQZK7©ON Protocol: 


(COIN) 
1. A and B invoke Fcon k times. If A blocks any output coin; fori =1,...,k 
(by sending REFUSE as second input), B aborts the protocol. 
(CRS) 
2. A and B compute w = coin, ...coing. 


(NIZK) 
3. A sends 7(w,2) to B. B checks the proof and accepts or rejects accordingly. 


Fig. 5. Intermediate Protocol for IQZK 


stage in which an honest verifier simulator is assumed. Then, by using a technique 
of [7], each stage can be converted to obtain zero-knowledge against any classical 
verifier. Finally, Watrous’ quantum rewinding lemma is applied in each stage to 
prove zero-knowledge also against any quantum verifier. 

Here, we propose a simpler transformation from non-interactive (quantum) 
zero-knowledge (NIZK) to interactive quantum zero-knowledge (IQZK) by com- 
bining the Coin — Flip Protocol with any NIZK Protocol. Our coin-flipping 
generates a truly random coin even in the case of a malicious quantum verifier. 
A sequence of such coins can then be used in any subsequent NIZK Protocol, 
which is also secure against quantum verifiers, due to its mono-direction. Here, 
we define a (NIZK)-subprotocol as given in [2]: Both parties A and B get com- 
mon input xz. A common reference string w of size k allows the prover A, 
who knows a witness w, to give a non-interactive zero-knowledge proof 7(w, x) to 
a (quantum-) computationally bounded verifier B. By definition, the 
(NIZK)-subprotocol is complete and sound and satisfies zero-knowledge. 

The IQZK Protocol is shown in Figure [J To prove that it is an interactive 
quantum zero-knowledge protocol, we first construct an intermediate 
IQZK7©ON Protocol (see Fig. 5) that runs with the ideal functionality Fco. 
Then we prove that the IQZK7©" Protocol satisfies completeness, soundness 
and zero-knowledge according to standard definitions. Finally, by replacing the 
calls to Fcoin with our Coin — Flip Protocol, we can complete the transfor- 
mation to the final IQZK Protocol. 


Completeness: If x € L, the probability that (A,B) rejects x is negligible in the 


length of x. 
From the ideal functionality Fcoin it follows that each coin; in Step J is 
uniformly random for all i=1,...,k. Hence, w in Step Blis a uniformly random 


common reference string of size k. By definition of any (NIZK)-subprotocol, we 
have acceptance probability 


Prilw Er {0,1}*, a(w, x) — A(w, 2, w) : B(w, 2, r(w,2)) = 1] >1-e", 
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where e” is negligible in the length of «æ. Thus, completeness for the 
IQZK7©ON Protocol follows. 


Soundness: If x ¢ L, then for any unbounded prover A*, the probability that 
(A*,B) accepts x is negligible in the length of x. 

Any dishonest A* might stop the IQZK7©!" Protocol at any point during 
execution. For example, she can block the output in Step [Jor she can refuse to 
send a proof m in the (NIZK)-subprotocol. Furthermore, A* can use an invalid w 
(or x) for r. In all of these cases, B will abort without even checking the proof. 
Therefore, A*’s best strategy is to “play the entire game”, i.e. to execute the 
entire IQZK7©" Protocol without making obvious cheats. 

A* can only convince B in the (NIZK)-subprotocol of a m for any given (i.e. 
normally generated) w with negligible probability 


Priw Er {0,1}*, n(w, £) — A*(w, 2) : Blw, 2, r(w,2)) = 1] . 


Therefore, the probability that A* can convince B in the entire IQZK7©" Protocol 
in case of x ¢ L is also negligible (in the length of x) and its soundness follows. 


Zero-Knowledge: An interactive proof system (A, B*) for language L is quan- 
tum zero-knowledge, if for any quantum verifier B*, there exists a simulator 
SRF such that S ramieFcoin = (A,B*) on common input x € L and arbitrary 
additional (quantum) input to B*. 

We construct simulator Siro interacting with dishonest B* and simulator 
Syrzx- Under the assumption on the zero-knowledge property of any NIZK Protocol, 
there exists a simulator — that, on input x € L, generates a randomly looking 
w together with a valid proof r for x (without knowing witness w). S izt Coin is de- 
scribed in Figure [] It receives a random string w from Siitzk; which now replaces 
the string of coins produced by the calls to Feoin in the IQZK7ON Protocol. 
The “merging” of coins into w in Step Blof the protocol (Fig. B) is equivalent 
to the “splitting” of w into coins in Step Blof the simulation (Fig. IG). Thus, the 
simulated proof 7(w, x) is indistinguishable from a real proof, which shows that 
the IQZK*7" Protocol is zero-knowledge. 


SrqzK* coin : 


1. S iF coin gets input x. 


2. It invokes Syizx with x and receives m (w, 2). 
. . ê . * 
3. Let w = coin, ...coing. SrgzgFcon Sends each coin; one by one to B*. 


4. S akr CN sends 7(w, x) to B* and outputs whatever B* outputs. 


Fig. 6. The Simulation of the Intermediate Protocol for IQZK 


64 I. Damgard and C. Lunemann 


IQZK Protocol: 


(CFP) For all i=1,...,k repeat Steps 1. — 4. 
1. A chooses a; Er {0,1} and computes com(ai, ri). She sends com(ai, ri) to B. 
2. B chooses b; Er {0,1} and sends b; to A. 
3. A sends open(a;, ri) and B checks if the opening is valid. 
4. Both compute coin; = ai ® bi. 


(CRS) 
5. A and B compute w = coin, ...coing. 


(NIZK) 
6. A sends z(w,2) to B. B checks the proof and accepts or rejects accordingly. 


Fig. 7. Interactive Quantum Zero-Knowledge 


It would be natural to think that the IQZK Protocol could be proved secure 
simply by showing that the IQZK*7©" Protocol implements some appropriate 
functionality and then use the composition theorem from [8]. Unfortunately, a 
zero-knowledge protocol — which is not necessarily a proof of knowledge — cannot 
be modeled by a functionality in a natural way. We therefore instead prove ex- 
plicitly that the IQZK Protocol has the standard properties of a zero-knowledge 
proof as follows. 


Completeness: From the analysis of the Coin — Flip Protocol and its indistin- 
guishability from the ideal functionality Fco, it follows that if both players hon- 
estly choose random bits, each coin; for all i = 1,...,k in the (CFP)-subprotocol 
is generated uniformly at random. Thus, w is a random common reference string 
of size k and the acceptance probability of the (NIZK)-subprotocol as given above 
holds. Completeness for the IQZK Protocol follows. 


Soundness: Again, we only consider the case where A* executes the entire 
protocol without making obvious cheats, since otherwise, B immediately aborts. 
Assume that A* could cheat in the IQZK Protocol, i.e., B would accept an invalid 
proof with non-negligible probability. Then we could combine A* with simulator 
A* of the Coin — Flip Protocol (Fig. B) to show that the IQZK7©" Protocol 
was not sound. This, however, is inconsistent with the previously given soundness 
argument and thus proves by contradiction that the IQZK Protocol is sound. 


Zero-Knowledge: A simulator Ŝrgzg can be composed of simulator S aar coin 
(Fig. B) and simulator B* for the Coin — Flip Protocol (Fig. J). Ŝrozę gets 
classical input x as well as quantum input W and X. It then receives a valid proof 
t and a random string w from Ŝyrzx. As in Saar cond , w is split into coin, ...coing. 
For each coin;, it will then invoke B* to simulate one coin-flip execution with 
coin; as result. In other words, whenever B* asks Fcon to output a bit (Step B} 
Fig. Ø), it instead receives this coin;. The transcript of the simulation, i.e. t(w, x) 
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as well as (com(a;,1;), bi, open(a;,7;)) Vi = 1,...,k and w = coin,...coing, is 
indistinguishable from the transcript of the IQZK Protocol for any quantum- 
computationally bounded B*, which concludes the zero-knowledge proof. 


4.2 Generating Commitment Keys for Improved Quantum 
Protocols 


Recently, Damgård et al. H] proposed a general compiler for improving the se- 
curity of a large class of quantum protocols. Alice starts such protocols by trans- 
mitting random BB84-qubits to Bob who measures them in random bases. Then 
some classical messages are exchanged to accomplish different cryptographic 
tasks. The original protocols are typically unconditionally secure against cheat- 
ing Alice, and secure against a so-called benignly dishonest Bob, i.e., Bob is 
assumed to handle most of the received qubits as he is supposed to. Later on in 
the protocol, he can deviate arbitrarily. The improved protocols are then secure 
against an arbitrary computationally bounded (quantum) adversary. The com- 
pilation also preserves security in the bounded-quantum-storage model (BQSM) 
that assumes the quantum storage of the adversary to be of limited size. If the 
original protocol was BQSM-secure, the improved protocol achieves hybrid secu- 
rity, i.e., it can only be broken by an adversary who has large quantum memory 
and large computing power. 

Briefly, the argument for computational security proceeds along the following 
lines. After the initial qubit transmission from A to B, B commits to all his 
measurement bases and outcomes. The (keyed) dual-mode commitment scheme 
that is used must have the special properties that the key can be generated 
by one of two possible key-generation algorithms: Gy or Gg. Depending of the 
key in use, the scheme provides both flavors of security. Namely, with key pkH 
generated by Gy, respectively pkB produced by Gg, the commitment scheme is 
unconditionally hiding respectively unconditionally binding. Furthermore, the 
scheme is secure against a quantum adversary and it holds that pkH © pkB. The 
commitment construction is described in full detail in J. 

In the real-life protocol, B uses the unconditionally hiding key pkH to main- 
tain unconditional security against any unbounded A*. To argue security against 
a computationally bounded B*, an information-theoretic argument involving 
simulator Ê’ (see [Ø]) is given to prove that B* cannot cheat with the uncon- 
ditionally binding key pkB. Security in real life then follows from the quantum- 
computational indistinguishability of pkH and pkB. 

The CRS model is assumed to achieve high efficiency and practicability. Here, 
we discuss integrating the generation of a common reference string from scratch 
based on our quantum-secure coin-flipping. Thus, we can implement the entire 
process in the quantum world, starting with the generation of a CRS without an 
initially shared information and using it during compilation as commitment key 


3 Note that implementing the entire process comes at the cost of a non constant-round 
construction, added to otherwise very efficient protocols under the CRS-assumption. 
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As mentioned in [4], a dual-mode commitment scheme can be constructed from 
the lattice-based cryptosystem of Regev [I3]. It is based on the learning with 
error problem, which can be reduced from worst-case (quantum) hardness of the 
(general) shortest vector problem. Hence, breaking Regev’s cryptosystem implies 
an efficient algorithm for approximating the lattice problem, which is assumed to 
be hard even quantumly. Briefly, the cryptosystem uses dimension k as security 
parameter and is parametrized by two integers m and p, where p is a prime, 
and a probability distribution on Zp. A regular public key for Regev’s scheme is 
indistinguishable from a case where a public key is chosen independently from 
the secret key, and in this case, the ciphertext carries essentially no information 
about the message. Thus, the public key of a regular key pair can be used as the 
unconditional binding key pkB’ in the commitment scheme for the ideal-world 
simulation. Then for the real protocol, an unconditionally hiding commitment 
key pkH’ can simply be constructed by uniformly choosing numbers in ZE x Zp. 
Both public keys will be of size O(mk log p), and the encryption process involves 
only modular additions, which makes its use simple and efficient. 

The idea is now the following. We add (at least) k executions of our 
Coin — Flip Protocol as a first step to the construction of M] to generate 
a uniformly random sequence coin; ...coing. These k random bits produce a 
pkH’ as sampled by Gy, except with negligible probability. Hence, in the real- 
world, Bob can use coin, ... coing = pkH’ as key for committing to all his basis 
choices and measurement outcomes. Since an ideal-world adversary Ê’ is free 
to choose any key, it can generate (pkB’, sk’), i.e. a regular public key together 
with a secret key according to Regev’s cryptosystem. For the security proof, 
write pkB’ = coin, . . . coing. In the simulation, B’ first invokes B* for each coin; 
to simulate one coin-flip execution with coin; as result. As before, whenever B* 
asks Fcoin to output a bit, it instead receives this coin;. Then Ê’ has the possi- 
bility to decrypt dishonest B*’s commitments during simulation, which binds B* 
unconditionally to his committed measurement bases and outcomes. Finally, as 
we proved in the analysis of the Coin — Flip Protocol that pkH’ is a uniformly 
random string, Regev’s proof of semantic security shows that pkH’ © pkB’, and 
(quantum-) computational security of the real protocols in [4] follows. 


5 On Efficient Simulation in the CRS Model 


For our Coin — Flip Protocol in the plain model, we cannot claim universal 
composability. As already mentioned, in case of unconditional security against 
dishonest A* according to Definition 22] we do not require the simulator to be 
efficient. In order to achieve efficient simulation, A* must be able to extract the 
choice bit efficiently out of A*’s commitment, such that A*’s input is defined 
after this step. The standard approach to do this is to give the simulator some 
trapdoor information related to the common reference string, that A* does not 
have in real life. Therefore, we extend the commitment scheme to build in such 
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a trapdoor and ensure efficient extraction. To further guarantee UC-security, 
we circumvent the necessity of rewinding B* by extending the construction also 
with respect to equivocability. 

We will adapt an approach to our set-up, which is based on the idea of UC- 
commitments [B] and already discussed in the full version of [AJ]. We require a 
X-protocol for a (quantumly) hard relation R = {(a,w)}, i.e. an honest verifier 
perfect zero-knowledge interactive proof of knowledge, where the prover shows 
that he knows a witness w such that the problem instance «x is in the language 
L ((x,w) € R). Conversations are of form (ay, cys, zs), where the prover sends 
ay, the verifier challenges him with bit cy, and the prover replies with zy. For 
practical candidates of R, see e.g. [5]. Instead of the simple commitment scheme, 
we use the keyed dual-mode commitment scheme described in Section ÆZ] but 
now based on a multi-bit version of Regev’s scheme [LZ]. Still we construct it 
such that depending of the key pkH or pkB, the scheme provides both flavors of 
security and it holds that pkH © pkB. 

In real life, the CRS consists of commitment key pkB and an instance x’ for 
which it holds that # w’ such that («’,w’) € R, where we assume that £ © 2’. 
To commit to bit a, A runs the honest verifier simulator to get a conversation 
(ay,a, zy). She then sends ay and two commitments co,c; to B, where cg = 
comMpxa(Zy,7) and cia = compn(0* , r’) with randomness r,r’ and z’ = |z]. 
Then, a,z5,r is send to open the relevant one of co or cı, and B checks that 
(ay, a, zy) is an accepting conversation. Assuming that the X-protocol is honest 
verifier zero-knowledge and pkB leads to unconditionally binding commitments, 
the new commitment construction is again unconditionally binding. 

During simulation, A* chooses a pkB in the CRS such that it knows the match- 
ing decryption key sk. Then, it can extract A*’s choice bit a by decrypting both 
co and cı and checking which contains a valid zy such that (ay,a, zx) is ac- 
cepting. Note that not both cp and cı can contain a valid reply, since otherwise, 
A* would know a w such that (x’,w’) € R. In order to simulate in case of 
B*, B* chooses the CRS as pkH and x. Hence, the commitment is uncondition- 
ally hiding. Furthermore, it can be equivocated, since 4 w with (x, w) € R and 
therefore, co, cı can both be computed with valid replies, i.e. co = compxn(Zox, 1) 
and cy = compxn(Z1y,1’). Quantum-computational security against B* follows 
from the indistinguishability of the keys pkB and pkH and the indistinguishablity 
of the instances x and 2’, and efficiency of both simulations is ensured due to 
extraction and equivocability. 
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A Watrous’ Quantum Rewinding Lemma 


Lemma 1 (Quantum Rewinding Lemma with small perturbations [20}). 
Let Q be the unitary (n,k)-quantum circuit as given in BW. Furthermore, let 
po,q € (0,1) and £ € (0,5) be real numbers such that 


1. |p—a|<e 
2. po(l — po) < q(1 — q), and 
3. po <p 


for all n-qubit states |Y). Then there exists a general quantum circuit R of size 
(een!) 
O 22 Eee 
Po(1 — po) 


such that, for every nr-qubit state |Y), the output p(w) of R satisfies 


(Pgooa(h)|0() |bgooa(w)) = 1- E 


P log? (1/2) 

where < = 16e Tm à 

Note that po denotes the lower bound on the success probability p, for which 
the procedure guarantees correctness. Furthermore, for negligible € but non- 
negligible po, it follows that £’ is negligible. For a more detailed description of 


the lemma and the corresponding proofs, we refer to BO. 
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Abstract. We study quantum protocols among two distrustful par- 
ties. Under the sole assumption of correctness—guaranteeing that hon- 
est players obtain their correct outcomes—we show that every protocol 
implementing a non-trivial primitive necessarily leaks information to a 
dishonest player. This extends known impossibility results to all non- 
trivial primitives. We provide a framework for quantifying this leakage 
and argue that leakage is a good measure for the privacy provided to the 
players by a given protocol. Our framework also covers the case where 
the two players are helped by a trusted third party. We show that de- 
spite the help of a trusted third party, the players cannot amplify the 
cryptographic power of any primitive. All our results hold even against 
quantum honest-but-curious adversaries who honestly follow the proto- 
col but purify their actions and apply a different measurement at the 
end of the protocol. As concrete examples, we establish lower bounds on 
the leakage of standard universal two-party primitives such as oblivious 
transfer. 


Keywords: two-party primitives, quantum protocols, quantum informa- 
tion theory, oblivious transfer. 


1 Introduction 


Quantum communication allows to implement tasks which are classically impos- 
sible. The most prominent example is quantum key distribution H| where two 
honest players establish a secure key against an eavesdropper. In the two-party 
setting however, quantum and classical cryptography often show similar limits. 
Oblivious transfer [22], bit commitment [2423], and even fair coin tossing 
are impossible to realize securely both classically and quantumly. On the other 
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hand, quantum cryptography allows for some weaker primitives impossible in 
the classical world. For example, quantum coin-flipping protocols with maxi- 
mum bias of Z — 4 exist] against any adversary {8| while remaining impossible 
based solely on classical communication. A few other weak primitives are known 
to be possible with quantum communication. For example, the generation of an 
additive secret-sharing for the product xy of two bits, where Alice holds bit x and 
Bob bit y, has been introduced by Popescu and Rohrlich as machines modeling 
non-signaling non-locality (also called NL-boxes) [29]. If Alice and Bob share 
an EPR pair, they can simulate an NL-box with symmetric error probability 
sin? 4 BJB]. Equivalently, Alice and Bob can implement 1-out-of-2 oblivious 
transfer (1-2-OT) privately provided the receiver Bob gets the bit of his choice 
only with probability of error sin? z [I]. It is easy to verify that even with such 
imperfection these two primitives are impossible to realize in the classical world. 
This discussion naturally leads to the following question: 


— Which two-party cryptographic primitives are possible to achieve using quan- 
tum communication? 


Most standard classical two-party primitives have been shown impossible to im- 
plement securely against weak quantum adversaries reminiscent to the classical 
honest-but-curious (HBC) behavior [22]. The idea behind these impossibility 
proofs is to consider parties that purify their actions throughout the protocol 
execution. This behavior is indistinguishable from the one specified by the pro- 
tocol but guarantees that the joint quantum state held by Alice and Bob at any 
point during the protocol remains pure. The possibility for players to behave that 
way in any two-party protocol has important consequences. For instance, the im- 
possibility of quantum bit commitment follows from this fact [2423]: After the 
commit phase, Alice and Bob share the pure state |Y”) E€ HAQHpB corresponding 
to the commitment of bit x. Since a proper commitment scheme provides no in- 
formation about z to the receiver Bob, it follows that tr (YP Xy?| = tra |wt)w"|. 
In this case, the Schmidt decomposition guarantees that there exists a unitary 
Uo,ı acting only on Alice’s side such that |W!) = (Uo,1 @Ig)|°). In other words, 
if the commitment is concealing then Alice can open the bit of her choice by 
applying a suitable unitary transform only to her part. A similar argument al- 
lows to conclude that 1-2-oT is impossible 22]: Suppose Alice is sending the 
pair of bits (bo,b1) to Bob through 1-2-oT. Since Alice does not learn Bob’s 
selection bit, it follows that Bob can get bit bọ before undoing the reception of 
bo and transforming it into the reception of bı using a local unitary transform 
similar to Uo, for bit commitment. For both these primitives, privacy for one 
player implies that local actions by the other player can transform the honest 
execution with one input into the honest execution with another input. 

In this paper, we investigate the cryptographic power of two-party quan- 
tum protocols against players that purify their actions. This quantum honest- 
but-curious (QHBC) behavior is the natural quantum version of classical HBC 


‘In fact, protocols with better bias are known for weak quantum coin flip- 


ping [2082692 A. 
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behavior. We consider the setting where Alice obtains random variable X and 
Bob random variable Y according to the joint probability distribution Px y. 
Any Px y models a two-party cryptographic primitive where neither Alice nor 
Bob provide input. For the purpose of this paper, this model is general enough 
since any two-party primitive with inputs can be randomized (Alice and Bob 
pick their input at random) so that its behavior can be described by a suitable 
joint probability distribution Py y. If the randomized version Px,y is shown 
to be impossible to implement securely by any quantum protocol then also the 
original primitive with inputs is impossible. 

Any quantum protocol implementing Px,y must produce, when both parties 
purify their actions, a joint pure state |y} € Has QHpgp that, when subsystems 
of A and B are measured in the computational basis, leads to outcomes X and Y 
according the distribution Px y. Notice that the registers A’ and B’ only provide 
the players with extra working space and, as such, do not contribute to the output 
of the functionality (so parties are free to measure them the way they want). 
In this paper, we adopt a somewhat strict point of view and define a quantum 
protocol m for Px,y to be correct if and only if the correct outcomes X,Y are 
obtained and the registers A’ and B’ do not provide any additional information 
about Y and X respectively since otherwise 7 would be implementing a different 
primitive Py x yy rather than Px y. 

The state |y} produced by any correct protocol for Px,y is called a quantum 
embedding of Px,y. An embedding is called regular if the registers A’ and B’ are 
empty. Any embedding |W) € Haar Hpgp can be produced in the QHBC model 
by the trivial protocol asking Alice to generate |Y) before sending the quantum 
state in Hgp to Bob. Therefore, it is sufficient to investigate the cryptographic 
power of embeddings in order to understand the power of two-party quantum 
cryptography in the QHBC model. 

Notice that if X and Y were provided privately to Alice and Bob—through 
a trusted third party for instance—then the expected amount of information 
one party gets about the other party’s output is minimal and can be quantified 
by the Shannon mutual information I(X;Y) between X and Y. Assume that 
|) € Hax ® Hep: is the embedding of Px,y produced by a correct quantum 
protocol. We define the leakage of |Y) as 


Ay := max { S(X; BB’) — I(X;Y), S(Y; AA’) -—1(Y;X)}, (1) 


where S(X; BB’) (resp. S(Y;AA’)) is the information the quantum registers 
BB’ (resp. AA’) provide about the output X (resp. Y). That is, the leakage is the 
maximum amount of extra information about the other party’s output given the 
quantum state held by one party. It turns out that S(X; BB’) = S(Y; AA’) holds 
for all embeddings, exhibiting a symmetry similar to its classical counterpart 
I(X;Y) = I(Y; X) and therefore, the two quantities we are taking the maximum 
of (in the definition of leakage above) coincide. 


CONTRIBUTIONS. Our first contribution establishes that the notion of leakage 
is well behaved. We show that the leakage of any embedding for Px y is lower 
bounded by the leakage of some regular embedding of the same primitive. Thus, 
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in order to lower bound the leakage of any correct implementation of a given 
primitive, it suffices to minimize the leakage over all its regular embeddings. We 
also show that the only non-leaking embeddings are the ones for trivial primi- 
tives, where a primitive Px y is said to be (cryptographically) trivial if it can be 
generated by a classical protocol against HBC adversaried It follows that any 
quantum protocol implementing a non-trivial primitive Px,y must leak infor- 
mation under the sole assumption that it produces (X,Y) with the right joint 
distribution. This extends known impossibility results for two-party primitives 
to all non-trivial primitives. 

Embeddings of primitives arise from protocols where Alice and Bob have full 
control over the environment. Having in mind that any embedding of a non- 
trivial primitive leaks information, it is natural to investigate what tasks can be 
implemented without leakage with the help of a trusted third party. The notion 
of leakage can easily be adapted to this scenario. We show that no cryptographic 
two-party primitive can be implemented without leakage with just one call to the 
ideal functionality of a weaker primitivi. This new impossibility result does not 
follow from the ones known since they all assume that the state shared between 
Alice and Bob is pure. 

We then turn our attention to the leakage of correct protocols for a few con- 
crete universal primitives. From the results described above, the leakage of any 
correct implementation of a primitive can be determined by finding the (regular) 
embedding that minimizes the leakage. In general, this is not an easy task since 
it requires to find the eigenvalues of the reduced density matrix p4 = tre |YX4]| 
(or equivalently pg = tra |w)(7|). As far as we know, no known results allow 
us to obtain a non-trivial lower bound on the leakage (which is the difference 
between the mutual information and accessible information) of non-trivial primi- 
tives. One reason being that in our setting we need to lower bound this difference 
with respect to a measurement in one particular basis. However, when Px,y is 
such that the bit-length of either X or Y is short, the leakage can be computed 
precisely. We show that any correct implementation of 1-2-OT necessarily leaks 
t bit. Since NL-boxes and 1-2-oT are locally equivalent, the same minimal leak- 
age applies to NL-boxes [B8]. This is a stronger impossibility result than the 
one by Lo 2] since he assumes perfect/statistical privacy against one party 
while our approach only assumes correctness (while both approaches apply even 
against QHBC adversaries). We finally show that for Rabin-OT and 1-2-oT of 
r-bit strings (i.e. ROT” and 1-2-OT” respectively), the leakage approaches 1 ex- 
ponentially in r. In other words, correct implementations of these two primitives 
trivialize as r increases since the sender gets almost all information about Bob’s 


2 We are aware of the fact that our definition of triviality encompasses cryptograph- 
ically interesting primitives like coin-tossing and generalizations thereof for which 
highly non-trivial protocols exist B78]. However, the important fact (for the pur- 
pose of this paper) is that all these primitives can be implemented by trivial classical 
protocols against HBC adversaries. 

3 The weakness of a primitive will be formally defined in terms of entropic monotones 
for classical two-party computation introduced by Wolf and Wullschleger [96], see 


Section ÆJ 
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reception of the string (in case of ROT”) and Bob’s choice bit (in case of 1-2-0T”). 
These are the first quantitative impossibility results for these primitives and cer- 
tainly the first time the hardness of implementing different flavors of string OTs 
is shown to increase as the strings to be transmitted get longer. 

Finally, we note that our lower bounds on the leakage of the randomized prim- 
itives also lower-bound the minimum leakage for the standard versions of these 
primitive where the players choose their inputs uniformly at random. While 
we focus on the typical case where the primitives are run with uniform inputs, 
the same reasoning can be applied to primitives with arbitrary distributions 
of inputs. 


RELATED WORK. Our framework allows to quantify the minimum amount of 
leakage whereas standard impossibility proofs as the ones of B3IBÆR2DPI/] do 
not in general provide such quantification since they usually assume privacy 
for one player in order to show that the protocol must be totally insecure for 
the other played. By contrast, we derive lower bounds for the leakage of any 
correct implementation. At first glance, our approach seems contradictory with 
standard impossibility proofs since embeddings leak the same amount towards 
both parties. To resolve this apparent paradox it suffices to observe that in 
previous approaches only the adversary purified its actions whereas in our case 
both parties do. If a honest player does not purify his actions then some leakage 
may be lost by the act of irreversibly and unnecessarily measuring some of his 
quantum registers. 

Our results complement the ones obtained by Colbeck in for the set- 
ting where Alice and Bob have inputs and obtain identical outcomes (called 
single-function computations). shows that in any correct implementation of 
primitives of a certain form, an honest-but-curious player can access more in- 
formation about the other party’s input than it is available through the ideal 
functionality. Unlike [I0], we deal in our work with the case where Alice and 
Bob do not have inputs but might receive different outputs according to a joint 
probability distributions. We show that only trivial distributions can be imple- 
mented securely in the QHBC model. Furthermore, we introduce a quantitative 
measure of protocol-insecurity that lets us answer which embedding allow the 
least effective cheating. 

Another notion of privacy in quantum protocols, generalizing its classical 
counterpart from DZI], is proposed by Klauck in [I9]. Therein, two-party quan- 
tum protocols with inputs for computing a function f : Æ xY — Z, where ¥ and 
YV denote Alice’s and Bob’s respective input spaces, and privacy against QHBC 


* The definition of leakage of an embedding can be generalized to protocols with inputs, 
where it is defined as max{supy,, S(X; Ve) — I(X;Y), supy, S(Va; Y) — 1(X;Y)}, 
where X and Y involve both inputs and outputs of Alice and Bob, respectively. The 
supremum is taken over all possible (quantum) views V4 and Vg of Alice and Bob 
obtained by their (QHBC-consistent) actions (and containing their inputs). 
Trade-offs between the security for one and the security for the other player have 
been considered before, but either the relaxation of security has to be very small BJ 
or the trade-offs are restricted to particular primitives such as commitments B6]. 
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adversaries are considered. Privacy of a protocol is measured in terms of privacy 
loss, defined for each round of the protocol and fixed distribution of inputs Px, y, 
by S(B; X|Y) = H(X|Y) — S(X|B,Y), where B denotes Bob’s private working 
register, and X := (X’, f(X’, Y’)), Y := (Y’, f(X’, Y’)) represent the complete 
views of Alice and Bob, respectively. Privacy loss of the entire protocol is then 
defined as the supremum over all joint input distributions, protocol rounds, 
and states of working registers. In our framework, privacy loss corresponds to 
S(X;YB) — I(X;Y) from Alice point’s of view and S(Y; XA) — I(X;Y) from 
Bob’s point of view. Privacy loss is therefore very similar to our definition of 
leakage except that it requires the players to get their respective honest outputs. 
As a consequence, the protocol implementing Px,y by asking one party to pre- 
pare a regular embedding of Px y before sending her register to the other party 
would have no privacy loss. Moreover, the scenario analyzed in is restricted 
to primitives which provide the same output f(X,Y) to both players. Another 
difference is that since privacy loss is computed over all rounds of a protocol, 
a party is allowed to abort which is not considered QHBC in our setting. In 
conclusion, the model of is different from ours even though the measures of 
privacy loss and leakage are similar. provides interesting results concerning 
trade-offs between privacy loss and communication complexity of quantum pro- 
tocols, building upon similar results of in the classical scenario. It would be 
interesting to know whether a similar operational meaning can also be assigned 
to the new measure of privacy, introduced in this paper. 

A recent result by Kiinzler et al. shows that two-party functions that are 
securely computable against active quantum adversaries form a strict subset of 
the set of functions which are securely computable in the classical HBC model. 
This complements our result that the sets of securely computable functions in 
both HBC and QHBC models are the same. 


ROADMAP. In Section] we introduce the cryptographic and information-theoretic 
notions and concepts used throughout the paper. We define, motivate, and ana- 
lyze the generality of modeling two-party quantum protocols by embeddings in 
Section Bland define triviality of primitives and embeddings. In Section] we de- 
fine the notion of leakage of embeddings, show basic properties and argue that it is 
a reasonable measure of privacy. In Section] we explicitly lower bound the leak- 
age of some universal two-party primitives. Finally, in Section[§Jwe discuss possible 
directions for future research and open questions. 


2 Preliminaries 


QUANTUM INFORMATION THEORY. Let |) 4, € Hag be an arbitrary pure 
state of the joint systems A and B. The states of these subsystems are p4 = 
tre |YXy| and pg = tra |v, respectively. We denote by S(A) := S(p4) and 
S(B):= S(pg) the von Neumann entropy (defined as the Shannon entropy of 
the eigenvalues of the density matrix) of subsystem A and B respectively. Since 
the joint system is in a pure state, it follows from the Schmidt decomposition 
that S(A) = S(B) (see e.g. [28]). Analogously to their classical counterparts, we 
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can define quantum conditional entropy S(A|B) := S(AB)— S(B), and quantum 
mutual information S(A; B) := S(A) + S(B) — S(AB) = S(A) — S(A|B). Even 
though in general, S(A|B) can be negative, S(A|B) > 0 is always true if A is 
a classical register. Let R = {(Px(x), fh }eex be an ensemble of states ph with 
prior probability Px(a). The average quantum state is pr = J pex Px(@)pR- 
The famous result by Holevo upper-bounds the amount of classical information 
about X that can be obtained by measuring pr: 


Theorem 2.1 (Holevo bound [14]32]). Let Y be the random variable describ- 
ing the outcome of some measurement applied to pr for R = {Px(2), ph}rex. 
Then, I(X;Y) < S(pr)— >, Px(x)S(pR), where equality can be achieved if and 
only if {ppheex are simultaneously diagonalizable. 


Note that if all states in the ensemble are pure and all different then in order to 
achieve equality in the theorem above, they have to form an orthonormal basis 
of the space they span. In this case, the variable Y achieving equality is the 
measurement outcome in this orthonormal basis. 


DEPENDENT PART. The following definition introduces a random variable de- 
scribing the correlation between two random variables X and Y, obtained by 
collapsing all values x; and x2 for which Y has the same conditional distribu- 
tion, to a single value. 


Definition 2.2 (Dependent part [86]). For two random variables X,Y, let 
fx (x) := Py|x=2. Then the dependent part of X with respect to Y is defined 
as X \ Y:= fx(X). 


The dependent part X \, Y is the minimum random variable among the random 
variables computable from X for which X e X N Y e Y forms a Markov chain 
[36]. In other words, for any random variable K = f(X) such that X > K > 
Y is a Markov chain, there exists a function g such that g( K) = X N Y. 
Immediately from the definition we get several other properties of X N Y BE: 
A(Y|X NY) = A(Y|X), (X;Y)=I(XNY;Y) and X VY =XN\(Y\ 
X). The second and the third formula yield I(X;Y) =I(X \Y;Y \ X). 

The notion of dependent part has been further investigated in [[315)87]. 
Wullschleger and Wolf have shown that quantities H(X N Y|Y) and H(Y N 
X|X) are monotones for two-party computation B7]. That is, none of these 
values can increase during classical two-party protocols. In particular, if Al- 
ice and Bob start a protocol from scratch then classical two-party protocols 
can only produce (X,Y) such that: H(X N Y|Y) = A(Y N XIX) = 0, 
since H(X N Y|Y) > 0 if and only if H(Y N X|X) > 0 BY. Conversely, 
any primitive satisfying H(X N Y|Y) = H(Y N X|X) = 0 can be imple- 
mented securely in the honest-but-curious (HBC) model. We call such primitives 
trivia 


6 See Footnote J] for a caveat about this terminology. 
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PURIFICATION. All security questions we ask are with respect to (quantum) 
honest-but-curious adversaries. In the classical honest-but-curious adversary 
model (HBC), the parties follow the instructions of a protocol but store all in- 
formation available to them. Quantum honest-but-curious adversaries (QHBC), 
on the other hand, are allowed to behave in an arbitrary way that cannot be 
distinguished from their honest behavior by the other player. 

Almost all impossibility results in quantum cryptography rely upon a quantum 
honest-but-curious behavior of the adversary. This behavior consists in purifying 
all actions of the honest players. Purifying means that instead of invoking clas- 
sical randomness from a random tape, for instance, the adversary relies upon 
quantum registers holding all random bits needed. The operations to be exe- 
cuted from the random outcome are then performed quantumly without fixing 
the random outcomes. For example, suppose a protocol instructs a party to pick 
with probability p state |¢°),, and with probability 1 — p state |¢'), before 
sending it to the other party through the quantum channel C. The purified ver- 
sion of this instruction looks as follows: Prepare a quantum register in state 
VP\9) p+ V1 = pil) p holding the random process. Add a new register initially in 
state |0)ç before applying the unitary transform U : |r) p|0)c + |r) p|o")¢ for 
r € {0,1}, send register C through the quantum channel and keep register R. 

From the receiver’s point of view, the purified behavior is indistinguishable 
from the one relying upon a classical source of randomness because in both cases, 
the state of register C is p = p|°\¢°| + (1 — p)|¢'¢" |. All operations invoking 
classical randomness can be purified similarly B3B422)T7). The result is that 
measurements are postponed as much as possible and only extract information 
required to run the protocol in the sense that only when both players need 
to know a random outcome, the corresponding quantum register holding the 
random coin will be measured. If both players purify their actions then the joint 
state at any point during the execution will remain pure, until the very last step 
of the protocol when the outcomes are measured. 


SECURE 'Two-PARTY COMPUTATION. In Section J we investigate the leakage 
of several universal cryptographic two-party primitives. By universality we mean 
that any two-party secure function evaluation can be reduced to them. We in- 
vestigate the completely randomized versions where players do not have inputs 
but receive randomized outputs instead. Throughout this paper, the term prim- 
itive usually refers to the joint probability distribution defining its randomized 
version. Any protocol implementing the standard version of a primitive (with in- 
puts) can also be used to implement a randomized version of the same primitive, 
with the “inputs” chosen according to an arbitrary fixed probability distribution. 


3 Two-Party Protocols and Their Embeddings 


3.1 Correctness 


In this work, we consider cryptographic primitives providing X to honest player 
Alice and Y to honest player Bob according to a joint probability distribution 
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Px y. The goal of this section is to define when a protocol 7 correctly implements 
the primitive Py y. The first natural requirement is that once the actions of 7 are 
purified by both players, measurements of registers A and B in the computational 
basid4 provide joint outcome (X,Y) = (x,y) with probability Px y (x,y). 
Protocol 7 can use extra registers A’ on Alice’s and B’ on Bob’s side pro- 
viding them with (quantum) working space. The purification of all actions of m 
therefore generates a pure state |W) € Hag @ Harp. A second requirement for 
the correctness of the protocol m is that these extra registers are only used as 
working space, i.e. the final state |Y) 44/5, is such that the content of Alice’s 
working register A’ does not give her any further information about Bob’s out- 
put Y than what she can infer from her honest output X and vice versa for B’. 
Formally, we require that S(X A’;Y) = I(X;Y) and S(X;YB’) = I(X;Y) or 
equivalently, that A’ = X e Y and X e Y + B’ form Markov chaind. 


Definition 3.1. A protocol n for Px y is correct if measuring registers A and 
B of its final state in the computational basis yields outcomes X and Y with 
distribution Px y and the final state satisfies S(X; Y B’) = S(XA'; Y) = I(X;Y) 
where A’ and B' denote the extra working registers of Alice and Bob. The state 
lY) € Hap ® Haw is called an embedding of Pyy if it can be produced by the 
purification of a correct protocol for Px y. 


We would like to point out that our definition of correctness is stronger than the 
usual classical notion which only requires the correct distribution of the output 
of the honest players. For example, the trivial classical protocol for the primitive 
Pyy in which Alice samples both player’s outputs XY, sends Y to Bob, but 
keeps a copy of Y for herself, is not correct according to our definition, because 
it implements a fundamentally different primitive, namely Pxy.y. 


3.2 Regular Embeddings 


We call an embedding |Y) 45 4/5, regular if the working registers A’, B’ are empty. 
Formally, let On m := {0 : {0,1}” x {0,1} — [0...27)} be the set of functions 
mapping bit-strings of length m + n to real numbers between 0 and 27. 


Definition 3.2. For a joint probability distribution Px y where X € {0,1}" 
and Y € {0,1}, we define the set 


E(Px,y) = |) = HAB : lp) = 5 eten) / Px y(x, yle, Y) ap 0 = On,m $ 
xE{0,1}”, yEe{o,1}™ 


7 Tt is clear that every quantum protocol for which the final measurement (providing 
(x,y) with distribution Px,y to the players) is not in the computational basis can 
be transformed into a protocol of the described form by two additional local unitary 
transformations. 

8 Markov chains with quantum ends have been defined in [[] and used in subse- 
quent works such as [2]. It is straightforward to verify that the entropic condition 
S(XA';Y) = I(X;Y) is equivalent to A’ «+ X + Y being a Markov chain and 
similarly for the other condition. 
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and call any state |W) € E(Px,y) a regular embedding of the joint probability 
distribution Px y . 


Clearly, any |Y} € E(Px,y) produces (X,Y) with distribution Py,y since the 
probability that Alice measures x and Bob measures y in the computational basis 
is |\(d|a,y)|? = Px,y (x,y). In order to specify a particular regular embedding 
one only needs to give the description of the phase function 0(x, y). We denote 
by |W) € E(Px,y) the quantum embedding of Px y with phase function 6. The 
constant function 0(x,y):= 0 for all x € {0,1}",y € {0,1}™ corresponds to 
what we call canonical embedding |o) := $2, VPx,y(2,Y)|2,Y) ap - 

In Lemma[Z.3] below we show that every primitive Px y has a regular embed- 
ding which is in some sense the most secure among all embeddings of Px y. 


3.3 Trivial Classical Primitives and Trivial Embeddings 


In this section, we define triviality of classical primitives and (bipartite) embed- 
dings. We show that for any non-trivial classical primitive, its canonical quantum 
embedding is also non-trivial. Intuitively, a primitive Px y is trivial if X and Y 
can be generated by Alice and Bob from scratch in the classical honest-but- 
curious (HBC) mode} Formally, we define triviality via an entropic quantity 
based on the notion of dependent part (see Section B). 


Definition 3.3. A primitive Px,y is called trivial if it satisfies H(X N Y|Y) = 
0, or equivalently, H(Y N X|X)=0. Otherwise, the primitive is called 
non-trivial. 


Definition 3.4. A regular embedding |W) 4p E€ E(Px,y ) is called trivial if either 
S(X N Y|B) = 0 or S(Y N XJA) = 0. Otherwise, we say that |W) 4p, is 


non-trivial. 


Notice that unlike in the classical case, S(X N Y|B) = 0 = S(Y N XJA) = 
0 does not hold in general. As an example, consider a shared quantum state 
where the computational basis corresponds to the Schmidt basis for only one 
of its subsystems, say for A. Let |y) = a0) 4|€o) 5 + B]1) 4|&1) p be such that 
both subsystems are two-dimensional, {|&), |€1)} Æ {|0),|1)}, (€o|/1) = 0, and 
|(€o|0)| A |(€1|0)|. We then have S(X|B) = 0 and S(Y |A) > 0 while X = X \ Y 
and Y=Y\ xX. 

To illustrate this definition of triviality, we argue in the following that if a 
primitive Px y has a trivial regular embedding, there exists a classical protocol 
which generates X,Y securely in the HBC model. Let |y} € E(Px,y) be trivial 
and assume without loss of generality that S(Y N XJA) = 0. Intuitively, this 
means that Alice can learn everything possible about Bob’s outcome Y (Y could 
include some private coin-flips on Bob’s side, but that is “filtered out” by the 
dependent part). More precisely, Alice holding register A can measure her part of 


° See Footnote P] for a caveat about this terminology. 
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the shared state to completely learn a realization of Y N X, specifying Pxjy=y. 
She then chooses X according to the distribution Px;y—,. An equivalent way of 
trivially generating (X,Y) classically is the following classical protocol: 


1. Alice samples Px;y—y from distribution Pyx x and announces its outcome 
to Bob. She samples x from the distribution Px;y—,. 
2. Bob picks y with probability Pyly\X=Pxiy ay) 


Of course, the same reasoning applies in case S(X N Y|B) = 0 with the roles 
of Alice and Bob reversed. 

In fact, the following lemma (whose proof can be found in the full version [B3]) 
shows that any non-trivial primitive Px y has a non-trivial embedding, i.e. there 
exists a quantum protocol correctly implementing Px y while leaking less infor- 
mation to QHBC adversaries than any classical protocol for Px „y in the HBC 
model. 


Lemma 3.5. If Px y is a non-trivial primitive then the canonical embedding 
Wo) € E(Px,y) is also non-trivial. 


4 The Leakage of Quantum Embeddings 


We formally define the leakage of embeddings and establish properties of the 
leakage. The proofs of all statements in this section can be found in the full 


version [83]. 


4.1 Definition and Basic Properties of Leakage 


A perfect implementation of Px y simply provides X to Alice and Y to Bob and 
does nothing else. The expected amount of information that one random vari- 
able gives about the other is I(X;Y) = H(X) — H(X|Y) = H(Y) — H(Y|X) = 
I(Y; X). Intuitively, we define the leakage of a quantum embedding |W) Agap 
of Px,y as the larger of the two following quantities: the extra amount of in- 
formation Bob’s quantum registers BB’ provide about X and the extra amount 
Alice’s quantum state in AA’ provides about Y respectively in comparison to 
“the minimum amount” I(X; yng 


Definition 4.1. Let |Y) € Haparp be an embedding of Px,y. We define the 
leakage |W) as 


Ay(Px,y) := max {S(X; BB’) — I(X;Y), S(AA';Y) —I(X;Y)} . 
Furthermore, we say that |q) is d-leaking if Ay(Pxy) >ô. 


10 There are other natural candidates for the notion of leakage such as the difference in 
difficulty between guessing Alice’s output X by measuring Bob’s final quantum state 
B and based on the output of the ideal functionality Y. While such definitions do 
make sense, they turn out not to be as easy to work with and it is an open question 
whether the natural properties described later in this section can be established for 
these notions of leakage as well. 
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It is easy to see that the leakage is non-negative since S(X; BB’) > S(X; B) for B 
the result of a quantum operation applied to BB’. Such an operation could be the 
trace over the extra working register B’ and a measurement in the computational 
basis of each qubit of the part encoding Y, yielding S(X; B) = I(X;Y). 

We want to argue that our notion of leakage is a good measure for the privacy 
of the player’s outputs. In the same spirit, we will argue that the minimum 
achievable leakage for a primitive is related to the “hardness” of implementing 
it. We start off by proving several basic properties about leakage. 

For a general state in Hapap the quantities S(X;BB’) — I(X;Y) and 
S(AA’; Y) — I(X;Y) are not necessarily equal. Note though that they coincide 
for regular embeddings |Y} € E(Px,y) produced by a correct protocol (where 
the work spaces A’ and B’ are empty): Notice that S(X; B) = S(X) + S(B) — 
S(X, B) = H(X)+S(B)— H(X) = S(B) and because |q) is pure, S(A) = S(B). 
Therefore, S(X;B) = S(A;Y) and the two quantities coincide. The following 
lemma states that this actually happens for all embeddings and hence, the def- 
inition of leakage is symmetric with respect to both players. 


Lemma 4.2 (Symmetry). Let |) € Haparp: be an embedding of Px,y. Then, 
Ay (Px,y) = S(X; BB’) — I(X;Y) = S(AA’; Y)- I(X;Y). 


The next lemma shows that the leakage of an embedding of a given primitive is 
lower-bounded by the leakage of some regular embedding of the same primitive, 
which simplifies the calculation of lower bounds for the leakage of embeddings. 


Lemma 4.3. For every embedding |Y) of a primitive Px,y, there is a regular 
embedding |Y") of Px y such that Ay(Px,y) > Ay (Px,y). 


So far, we have defined the leakage of an embedding of a primitive. The natural 
definition of the leakage of a primitive is the following. 


Definition 4.4. We define the leakage of a primitive Px y as the minimal leak- 
age among all protocols correctly implementing Px y. Formally, 


Ap,» = mi Ay(Px,y) ’ 


where the minimization is over all embeddings |W) of Pxy. 


Notice that the minimum in the previous definition is well-defined, because by 
Lemma [3] it is sufficient to minimize over regular embeddings |Y) € E(Px.y). 
Furthermore, the function A,,(Px,y) is continuous on the compact (i.e. closed 
and bounded) set [0,2z]'**”! of complex phases corresponding to elements 
|z,Y) 4p in the formula for |),, € E&(Px,y) and therefore it achieves 
its minimum. 

The following theorem shows that the leakage of any embedding of a prim- 
itive Px y is lower-bounded by the minimal leakage achievable for primitive 
Px\y.v\.x (which due to Lemma Æ3]is achieved by a regular embedding). 
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Theorem 4.5. For any primitive Px y, APy y = APx yy. x: 


Proof (Sketch). The proof idea is to pre-process the registers storing X and Y 
in a way allowing Alice and Bob to convert a regular embedding of Py,y (for 
which the minimum leakage is achieved) into a regular embedding of Px\y,y\.x 
by measuring parts of these registers. It follows that on average, the leakage of 
the resulting regular embedding of Px\y.y\.x is at most the leakage of the 
embedding of Px,y the players started with. Hence, there must be a regular 
embedding of Px\y,y\,x leaking at most as much as the best embedding of 
Px y. See B3] for the complete proof. 


4.2 Leakage as Measure of Privacy and Hardness of Implementation 


The main results of this section are consequences of the Holevo bound 


(Theorem ET). 


Theorem 4.6. If a two-party quantum protocol provides the correct outcomes 
of Px y to the players without leaking extra information, then Px y must be a 
trivial primitive. 


Proof. Theorem {3Jimplies that if there is a 0-leaking embedding of Px y than 
there is also a O-leaking embedding of Px\y,y\.x. Let us therefore assume 
that |Y} is a non-leaking embedding of Px,y such that X = X N Y and Y = 
Y N X. We can write |Y) in the form |Y) = $, y Px(«)|x)|~2) and get pp 
= 90, Px(x)|¢~2zYz|. For the leakage of |Y) we have: Ay(Px,y) = S(X; B) — 
I(X;Y) = S(pp) — I(X;Y) = 0. From the Holevo bound (Theorem ZI) follows 
that the states {|y.)}, form an orthonormal basis of their span (since X = X N 
Y, they are all different) and that Y captures the result of a measurement in 
this basis, which therefore is the computational basis. Since Y = Y N X, we get 
that for each x, there is a single ys E€ Y such that |y,) = |yz). The primitives 
Px. yy\x and Px y are therefore trivial. 


In other words, the only primitives that two-party quantum protocols can imple- 
ment correctly (without the help of a trusted third party) and without leakage 
are the trivial ones! We note that it is not necessary to use the strict notion of 
correctness from Definition BJ] in this theorem, but a more complicated proof 
can be done solely based on the correct distribution of the values. This result 
can be seen as a quantum extension of the corresponding characterization for the 
cryptographic power of classical protocols in the HBC model. Whereas classical 
two-party protocols cannot achieve anything non-trivial, their quantum counter- 
parts necessarily leak information when they implement non-trivial primitives. 
The notion of leakage can be extended to protocols involving a trusted third 
party (see [33]). A special case of such protocols are the ones where the players 
are allowed one call to a black box for a certain non-trivial primitive. It is 
natural to ask which primitives can be implemented without leakage in this case. 
As it turns out, the monotones H(X N Y|Y) and H(Y N X|X), introduced 
in 6], are also monotones for quantum computation, in the sense that all joint 
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random variables X’,Y’ that can be generated by quantum players without 
leakage using one black-box call to Px y satisfy H(X’ N Y"|Y’) < H(X N Y|Y) 
and H(Y! N X’|X") < H(Y N X|X). 


Theorem 4.7. Suppose that primitives Px y and Px: y: satisfy H(X’ N Y'|Y')> 
A(X NY|Y) or H(Y’'N X"|X") > H(Y N X|X). Then any implementation of 
Px yı using just one call to the ideal functionality for Px y leaks information. 


4.3 Reducibility of Primitives and Their Leakage 


This section is concerned with the following question: Given two primitives Px y 
and Px: y such that Px,y is reducible to Px, y:, what is the relationship be- 
tween the leakage of Px,y and the leakage of Px: y:? We use the notion of 
reducibility in the following sense: We say that a primitive Px y is reducible in 
the HBC model to a primitive Px: y» if Px,y can be securely implemented in 
the HBC model from (one call to) a secure implementation of Px: y. The above 
question can also be generalized to the case where Px ,y can be computed from 
Px: yı only with certain probability. Notice that the answer, even if we assume 
perfect reducibility, is not captured in our previous result from Lemma £3] since 
an embedding of Px y» is not necessarily an embedding of Px y (it might vi- 
olate the correctness condition). However, under certain circumstances, we can 
show that APY yi > py a 


Theorem 4.8. Assume that primitives Px,y and Px: y’ = Pxi xi yzy; satisfy 
the condition: 


X Py; y: (£,y) > 1— ô, 


BYPxp yviixt =2,Y{=y=Px,y 


where the relation ~ means that the two distributions are equal up to relabeling 
of the alphabet. Then, Ap,,,, > (1 — ô)ApPx y- 


pepe 


This theorem allows us to derive a lower bound on the leakage of 1-out-of-2 
Oblivious Transfer of r-bit strings in Section B] 


5 The Leakage of Universal Cryptographic Primitives 


In this section, we exhibit lower bounds on the leakage of some universal two- 
party primitives. In the following table, ROT” denotes the r-bit string version 
of randomized Rabin OT, where Alice receives a random r-bit string and Bob 
receives the same string or an erasure symbol, each with probability 1/2. Sim- 
ilarly, 1-2-oT" denotes the string version of 1-2-OT, where Alice receives two 
r-bit strings and Bob receives one of them. By 1-2-OT, we denote the noisy 
version of 1-2-OT, where the 1-2-OT functionality is implemented correctly only 
with probability 1 — p. Table [] summarizes the lower bounds on the leakage of 
these primitives (the derivations can be found in the full version [B3]). We note 
that Wolf and Wullschleger have shown that a randomized 1-2-OT can be 
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Table 1. Lower bounds on the leakage for universal two-party primitives 


primitive leaking at least | comments 
ROT! (h(4) — 2) 5 7% 0.311 | same leakage for all regular embeddings 


ROT” (1 same leakage for all regular embeddings 


minimized by canonical embedding 


1-2-oT" (suboptimal) lower bound 


if p < sin? (1/8) ~ 0.15, (suboptimal) lower bound 


transformed by local operations into an additive sharing of an AND (here called 
SAND). Therefore, our results for 1-2-OT below also apply to SAND. 

1-2-OT" and 1-2-OT, are primitives where the direct evaluation of the leakage 
for a general embedding |o) is hard, because the number of possible phases 
increases exponentially in the number of qubits. Instead of computing S(A) 
directly, we derive (suboptimal) lower bounds on the leakage. 

Based on the examples of ROT” and 1-2-OT, it is tempting to conjecture that 
the leakage is always minimized for the canonical embedding, which agrees with 
the geometric intuition that the minimal pairwise distinguishability of quantum 
states in a mixture minimizes the von Neumann entropy of the mixture. However, 
Jozsa and Schlienz have shown that this intuition is sometimes incorrect [0]. 
In a quantum system of dimension at least three, we can have the following 
situation: For two sets of pure states {|u;) };—; and Iin WE q satisfying |(u;ļu;}| < 
|(vilv;)| for all i, j, there exist probabilities Di puch that for py := X>; Piles Xuil, 
Po = J; pilvivi|, it holds that S(py) < S(py). As we can see, although each 
pair |u;), |w;) is more distinguishable than the corresponding pair |v;), |v;), the 
overall p, provides us with less uncertainty than p,. It follows that although 
for the canonical embedding |o) = >7,,|%y)ly) of Px,y the mutual overlaps 
l(~y|~y)| are clearly maximized, it does not necessarily imply that S(A) in 
this case is minimal over €(Px,y). It is an interesting open question to find a 
primitive whose canonical embedding does not minimize the leakage or to prove 
that no such primitive exists. 

For the primitive Py’y, our lower bound on the leakage only holds for p < 
sin?(1/8) ~ 0.15. Notice that in reality, the leakage is strictly positive for any 
embedding of Pý% with p < 1/4, since for p < 1/4, Py’, is a non-trivial 
primitive. On the other hand, Py * is a trivial primitive implemented securely 
by the following protocol in the classical HBC model: 


1. Alice chooses randomly between her input bits x9 and x, and sends the 
chosen value x, to Bob. 
2. Bob chooses his selection bit c uniformly at random and sets y := za. 


Equality x. = y is satisfied if either a = c, which happens with probability 
1/2, or if a Æ c and £a = £1—a, which happens with probability 1/4. Since the 
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two events are disjoint, it follows that ze = y with probability 3/4 and that 
the protocol implements Py %*. The implementation is clearly secure against 
honest-but-curious Alice, since she does not receive any message from Bob. It 
is also secure against Bob, since he receives only one bit from Alice. By letting 
Alice randomize the value of the bit she is sending, the players can implement 
Py securely for any value 1/4 < p < 1/2. 


6 Conclusion and Open Problems 


We have provided a quantitative extension of qualitative impossibility results 
for two-party quantum cryptography. All non-trivial primitives leak information 
when implemented by quantum protocols. Notice that demanding a protocol to 
be non-leaking does in general not imply the privacy of the players’ outputs. 
For instance, consider a protocol implementing 1-2-oT but allowing a curious 
receiver with probability 4 to learn both bits simultaneously or with probability 
4 to learn nothing about them. Such a protocol for 1-2-OT would be non-leaking 
but nevertheless insecure. Consequently, Theorem /.Qnot only tells us that any 
quantum protocol implementing a non-trivial primitive must be insecure, but 
also that a privacy breach will reveal itself as leakage. Our framework allows to 
quantify the leakage of any two-party quantum protocol correctly implementing 
a primitive. The impossibility results obtained here are stronger than standard 
ones since they only rely on the cryptographic correctness of the protocol. Fur- 
thermore, we present lower bounds on the leakage of some universal two-party 
primitives. 

A natural open question is to find a way to identify good embeddings for a 
given primitive. In particular, how far can the leakage of the canonical embedding 
be from the best one? Such a characterization, even if only applicable to special 
primitives, would allow to lower bound their leakage and would also help to 
understand the power of two-party quantum cryptography in a more concise way. 

It would also be interesting to find a measure of cryptographic non-triviality 
for two-party primitives and to see how it relates to the minimum leakage of any 
implementation by quantum protocols. For instance, is it true that quantum 
protocols for primitive Px y leak more if the minimum (total variation) distance 
between Px y and any trivial primitive increases? 

Another question we leave for future research is to define and investigate other 
notions of leakage, e.g. in the one-shot setting instead of in the asymptotic regime 
(as outlined in Footnote [Q). Results in the one-shot setting have already been 
established for data compression [80], channel capacities [BI], state-merging 
and other (quantum-) information-theoretic tasks. 

Furthermore, it would be interesting to find more applications for the concept 
of leakage, considered also for protocols using an environment as a trusted third 
party. In this direction, we have shown in Theorem,7]that any two-party quan- 
tum protocol for a given primitive, using a black box for an “easier” primitive, 
leaks information. Lower-bounding this leakage is an interesting open question. 
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We might also ask how many copies of the “easier” primitive are needed to 
implement the “harder” primitive by a quantum protocol, which would give us 
an alternative measure of non-triviality of two-party primitives. 
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Abstract. Code-based cryptography is often viewed as an interesting 
“Post-Quantum” alternative to the classical number theory cryptogra- 
phy. Unlike many other such alternatives, it has the convenient advan- 
tage of having only a few, well identified, attack algorithms. However, 
improvements to these algorithms have made their effective complexity 
quite complex to compute. We give here some lower bounds on the work 
factor of idealized versions of these algorithms, taking into account all 
possible tweaks which could improve their practical complexity. The aim 
of this article is to help designers select durably secure parameters. 


Keywords: computational syndrome decoding, information set decod- 
ing, generalized birthday algorithm. 


Introduction 


Code-based cryptography has received renewed attention with the recent interest 
for “Post-Quantum Cryptography” (see for instance [5]). Several new interesting 
proposals have been published in the last few months BOIS]. For those new 
constructions as well as for previously known code-based cryptosystems, precise 
parameters selection is always a sensitive issue. Most of the time the most threat- 
ening attacks are based on decoding algorithms for generic linear codes. There 
are two main families of algorithms, Information Set Decoding (ISD), and Gen- 
eralized Birthday Algorithm (GBA). Each family being suited for some different 
parameter ranges. 

ISD is part of the folklore of algorithmic coding theory and is among the most 
efficient techniques for decoding errors in an arbitrary linear code. One major 
step in the development of ISD for the cryptanalysis of the McEliece encryption 
scheme is Stern’s variant [22] which mixes birthday attack with the traditional 
approach. A first implementation description [I0], with several improvements, 
led to an attack of 26+? binary operations for the original McEliece parameters, 
that is decoding 50 errors in a code of length 1024 and dimension 524. More re- 
cently [6], a new implementation was proposed with several new improvements, 
with a binary workfactor of 280-5, Furthermore, the authors report a real attack 
(with the original parameters) with a computational effort of about 258 CPU 
cycles. The above numbers are accurate estimates of the real cost of a decoding 
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attack. They involve several parameters that have to be optimized and further- 
more, no close formula exists, making a precise evaluation rather difficult. 

GBA was introduced by Wagner in 2002 but was not specifically designed 
for decoding. Less generic version of this algorithm had already been used in 
the past for various cryptanalytic applications PEL]. Its first successful use to 
cryptanalyse a code-based system is due to Coron and Joux [J]. In particular, 
this work had a significant impact for selecting the parameters of the FSB hash 
function [I]. 

Most previous papers on decoding attacks were written from the point of view 
of the attacker and were looking for upper bounds on the work factor of some 
specific implementation. One exception is the asymptotic analysis for ISD that 
has been recently presented in [8]. Here we propose a designer approach and we 
aim at providing tools to easily select secure parameters. 

For both families, we present new idealized version of the algorithms, which 
encompass all variants and improvements known in cryptology as well as some 
new optimizations. This allows us to give easy to compute lower bounds for 
decoding attacks up to the state of the art. 

We successively study three families of algorithms, first the “standard” birth- 
day attack, then two evolutions of this technique, namely Stern’s variant of infor- 
mation set decoding and Wagner’s generalized birthday algorithm. In each case 
we propose very generic lower bounds on their complexity. Finally, we illustrate 
our work with case studies of some of the main code-based cryptosystems. 


1 The Decoding Problem in Cryptology 


Problem 1 (Computational Syndrome Decoding - CSD). Given a matrix 
H € {0,1}"*", a word s € {0,1}" and an integer w > 0, find a word e € {0,1}" 
of Hamming weight < w such that eH? = s. 


We will denote CSD(H, s,w) an instance of that problem. It is equivalent to 
decoding w errors in a code with parity check matrix H. The decision problem 
associated with computational syndrome decoding, namely, Syndrome Decoding, 
is NP-complete l. 

This problem appears in code-based cryptography and for most systems it is 
the most threatening known attack (sometimes the security can be reduced to 
CSD alone [[23]). Throughout the paper we will denote 


Wn w = {e € {0,1}” | wt(e) = w} 


the set of all binary words of length n and Hamming weight w. The instances 
of CSD coming from cryptology usually have solutions. Most of the time, this 
solution is unique. This is the case for public-key encryption schemes [7E] or 
for identification schemes [23924]. However, if the number w of errors is larger 
than the Gilbert-Varshamov distanced we may have a few, or even a large num- 
ber, of solutions. Obtaining one of them is enough. This is the case for digital 
signatures [I3] or for hashing [MB]. 


1 The Gilbert-Varshamov distance is the smallest integer do such that (a) 22.: 
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2 The Birthday Attack for Decoding 


We consider an instance CSD(H, s, w) of the computational syndrome decoding. 
If the weight w is even, we partition the columns of H in two subsets (a priori 
of equal size). For instance, let H = (Hı | H2) and let us consider the sets 
Lı = {e HT | e1 E W,,/2,w/2} and Lp = {s + e2H3 | e2 € Wn/2,w/2}- Any 
element of Lı N L2 provides a pair (e1,e2) such that e Hı = s + e2Hə and 
eı + e2 is a solution to CSD(H, s,w). This collision search has to be repeated 
1/Prn,w times on average where Prn w is the probability that one of the solutions 
splits evenly between the left and right parts of H. Let Cy, denote the total 
number of columns sums we have to compute. If the solution is unique, we 
hav 


n/2 2 ” 
Prn,w = w/a) and Caw = [£il + [Lal = z(u) x2 (") 4 aw 
( ) Prnw (aia) w 2 


This number is close to the improvement expected when the birthday paradox 
can be applied (i.e. replacing an enumeration of N elements by an enumeration 
of 2/N elements). In this section, we will show that the factor \/7w/2 can be 
removed and that the formula often applies when w is odd. We will also provide 
cost estimations and bounds. 


2.1 A Decoding Algorithm Using the Birthday Paradox 


The algorithm presented in Table [generalizes the birthday attack for decoding 
presented above. For any fixed values of n, r and w this algorithm uses three 
parameters (to be optimized): an integer Z and two sets of constant weight words 
Wi and Wo. 

The idea is to operate as much as possible with partial syndromes of size £ < r 
and to make the full comparison on r bits only when we have a partial match. 
Increasing the size of W; (and W2) will lead to a better trade-off, ideally with a 
single execution of (MAIN LOOP). 


Definition 1. For any fixed value of n, r and w, we denote WFgpa(n,r,w) the 
minimal binary work factor (average cost in binary operations) of the algorithm 
of Table] to produce a solution to CSD, for any choices of parameters Wi, W2 
and £. 


An Estimation of the Cost. We will use the following assumptions (discussed 
in appendix): 


(B1) For all pairs (e1,e2) examined in the algorithm, the sums e; + e2 are 


uniformly and independently distributed in Wņn,w- 


? We use Stirling’s formula to approximate factorials. The approximation we give is 
valid because w & n. 


(B2) The cost of the execution of the algorithm is approximatively equal to 
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Table 1. Birthday decoding algorithm 


For any fixed values of n, r and w, the following algorithm uses three param- 
eters: an integer £ > 0, Wi C Wy jw/2) and W2 C Wy fw/27. We denote by 
he(x) the first £ bits of any x € {0,1}. 


procedure BirthdayDecoding 
input: Ho € {0,1}"*", s € {0,1}" 
repeat (MAIN LOOP) 
P — random n x n permutation matrix 
H — HoP 
for all e € Wi 
i — he(eHT) (BA 1 
write(e, i) // store e in some data structure at index i 
for all e2 € W2 
i — he(s +e2H") (BA 2 
S < read(i) // extract the elements stored at index i 
for alle, ES 
if e HT =s+e2H" (BA 3 
return (e1 + e2)P™ (SUCCESS 


é-H(BA 1) + 2-1(BA 2) + Ko- t(BA 3), 
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(1) 


where Ko is the cost for testing e,H? = s+e2H™ given that he(e1 HT) = 
he(s + e2HT) and {(BA i) is the expected number of execution of the 


instruction (BA i) before we meet the (SUCCESS) condition. 


Proposition 1. Under assumptions (BI) and (B2). We havd] 


WFpa(n,r,w) = 2Llog (KoL) with L= min ę / Oa) 


and Ko is the cost for executing the instruction (BA 3) (i.e. testing eH? = s). 


Remarks 


1. 


When (2) > 2", the cost will depend of the number of syndromes 2” instead 
of the number of words of weight w. This corresponds to the case where w is 
larger than the Gilbert-Varshamov distance and we have multiple solutions. 
We only need one of those solutions and thus the size of the search space is 


reduced. 


3 Here and after, “log” denotes the base 2 logarithm (and “In” the Neperian 


logarithm). 
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2. It is interesting to note the relatively low impact of Ko, the cost of the 
test in (BA 3). Between an extremely conservative lower bound of Ko = 2, 
an extremely conservative upper bound of Ko = wr and a more realistic 
Ko = 2w the differences are very small. 

3. In the case where w is odd and (wo I) < L, the formula of Proposition [is 
only a lower bound. A better estimate would be 


2 


n \2 2 
i DY aa pro wa) PE 
WFBa (n, r, w) ~ 2L' log Koy with I’ = 


Lw/2| 
aT a 2 
2( 724) 


4. Increasing the size of |W1| (and |W2]|) can be easily and efficiently achieved 
by “overlapping” Hı and Hə (see the introduction of this section). More 
precisely, we take for W, all words of weight w/2 using only the n’ first 
coordinates (with n/2 < n’ < n). Similarly, W2 will use the n’ (or more) last 
coordinates. 


2.2 Lower Bounds 


As the attacker can make a clever choice of W; and Wz which may contradict as- 
sumption (BU), we do not want to use it for the lower bound. The result remains 
very close to the estimate of the previous sections except for the multiplicative 
constant which is v2 instead of 2. 


Theorem 1. For any fixed value of n, r and w, we have 
WFpa(n,r,w) > V2Llog(KoL) with L = min ( er). 


where Ko is the cost for executing the instruction (BA 3). 


3 Information Set Decoding (ISD) 


We will consider here Stern’s algorithm [22], which is the best known decoder 
for cryptographic purposes, and some of its implemented variants by Canteaut- 
Chabaud and Bernstein-Lange-Peters [6]. Our purpose is to present a lower 
bound which takes all known improvements into account. 


3.1 A New Variant of Stern’s Algorithm 


Following other works [5M6], J. Stern describes in an algorithm to find 
a word of weight w in a binary linear code of length n and dimension k (and 
codimension r = n — k). The algorithm uses two additional parameters p and 
£ (both positive integers). We present here a generalized version which acts on 
the parity check matrix Ho of the code (instead of the generator matrix). Table B] 
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Table 2. Generalized ISD algorithm 


For any fixed values of n, r and w, the following algorithm uses four pa- 
rameters: two integers p > 0 and 4 > 0 and two sets Wı C Wee) p/2) and 
W2 C Wr+e,fp/2]- We denote by he(x) the last £ bits of any x € {0,1}”. 


procedure [SDecoding 
input: Ho € {0,1}"*”, so € {0, 1}” 
repeat (MAIN LOOP) 
P — random n x n permutation matrix 
(H',U) — PGElim(HoP) // partial elimination as in @) 
Te 
for all e € W1 
i — he(eH") (IisD 1 
write(e, i) // store e in some data structure at index i 
for all e2 € W2 
i — hels +e2H'7) (Isp 2 
S < read(i) // extract the elements stored at index i 
for alle, ES 
if wt (s+ (e1+e2)H’") = w-— p (ISD 3 
return (P,e1 + e2) (SUCCESS 


describes the algorithm. The partial Gaussian elimination of HoP consists in 
finding U (r x r and non-singular) and H (and H’) such thath 


k+4£ 


where U is a non-singular r x r matrix. Let s = sọUT. If e is a solution of 
CSD(H, s, w) then eP? is a solution of CSD(Ho, so, w). Let (P, e’) be the output 
of the algorithm, i.e., wt(s + e'H'T) = w -— p, and let e” be the first r — £ bits of 
s+ e'H'T, the word e = (e” | e’) is a solution of CSD(H, s, w). 


Definition 2. For any fixed value of n, r and w, we denote WFigp(n,1r,w) the 
minimal binary work factor (average cost in binary operations) of the algorithm of 
Tablellto produce a solution to CSD, for any choices of parameters £, p, W1 and W3. 


3.2 Estimation of the Cost of the New Variant 


To evaluate the cost of the algorithm we will assume that only the instructions 
(isD i) are significant. This assumption is stronger than for the birthday attack, 


* In the very unlikely event that the first r — £ columns are linearly dependent, we can 
change P. 
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because it means that the Gaussian elimination at the beginning of every (MAIN 
LOOP) costs nothing. It is a valid assumption as we only want a lower bound. 
Moreover, most of the improvements introduced in are meant to reduce 
the relative cost of the Gaussian elimination. We claim that within this “free 
Gaussian elimination” assumption any lower bound on the algorithm of Table B] 
will apply on all the variants of [6M0]. Our estimations will use the following 
assumptions: 


(I1) For all pairs (e1,e2) examined in the algorithm, the sums e1 + e2 are uni- 
formly and independently distributed in Wz+¢,p. 
(12) The cost of the execution of the algorithm is approximatively equal to 


é-f(isp 1) + @-f(isD 2) + Kw—p- #0SD 3), (4) 


where Kw-—p is the average cost for checking wt (s + (e1 + e2) H'T) = w-—p 
and (ISD i) is the expected number of executions of the instruction (ISD 
i) before we meet the (SUCCESS) condition. 


Proposition 2. Under assumptions (II) and (I). If (”) < 2” (single solution) 

or if (2) > 2" (multiple solutions) and lup) (6) < 2", we have (we recall that 

k=n-r) 

20 mi TILDE 

WFisp(n, r, w) ~ min 2émin (() 2) 
2 a(r) (H 


with A = 1—e7! x 0.63. If C) > 2” (multiple solutions) and ( á )(*) > 2", we 


w—p/ \p 
have ji 
262" z 
WFıisp(n, r, w) © min a with £ = log (Mig): 
Yo Vers) 


w—p 


Remarks 


1. For a given set of parameters the expected number of execution of (MAIN 
LOOP) is N = 1/(1 — exp(—X)) where X = (716) (1°) / min(2", (7)). 

2. The second formula applies when X > 1, that is when the expected number 
of execution of (MAIN LOOP) is (not much more than) one. In that case, as 
for the birthday attack, the best strategy is to use W2 = Wk+e,fp/2] (i.e. as 
large as possible) and Wj is as small as possible but large enough to have 
only one execution of (MAIN LOOP) with probability close to 1. 

3. When X < 1, we have N = 1/(1 — exp(—X)) ~ 1/X and the first formula 
applies. 

4. When X < 1, the first formula still gives a good lower bound. But it is less 
tight when X gets closer to 1. 

5. When p is small and odd the above estimates for WF ygp are not always 
accurate. The adjustment is similar to what we have in (2) (see the remarks 


following the birthday decoder estimation). In practice, if as ) < (E9 


it is probably advisable to discard this odd value of p. 


Security Bounds for the Design of Code-Based Cryptosystems 95 


6. We use the expression £ = log (Kw—pLp(0)) for the optimal value of ¢ (where 


L,(@) = (E9 or La (O = 2/2 ja (C) respectively in the first case or in 


the second case of the Proposition). In fact a better value would be a fixpoint 
of the mapping l —> Lp(£). In practice Lp(0) is a very good approximation. 


3.3 Gain Compared with Stern’s Algorithm 


Stern’s algorithm corresponds to a complete Gaussian elimination and to a par- 
ticular choice of Wı and W3 in the algorithm of Table] A full Gaussian elimi- 
nation is applied to the permuted matrix HoP and we get U and H’ such that: 


r k 


UHP = H = 


The ¢-bit collision search is performed on k columns, moreover p is always even 
and Wı and W will use p/2 columns of Hı and Hə. The variants presented 
in consist in reducing the cost of the Gaussian elimination, or, for the 
same H’, to use different “slices” (Hı | Hz) of £ rows. All other improvements 
lead to an operation count which is close to what we have in (). The following 
formula, obtained with the techniques of the previous section, gives a tight lower 
bound all those variants. 


2c) 

r—£L\ (k/2 
(ie) Ge 
The gain of the new version of ISD is ~ Aẹ/rp/2 which is rather small in 


practice and correspond to the improvement of the “birthday paradox” part of 
the algorithm. 


with € = log (Kw-p(*/3)). 


WFtern(n, r, w) ~ min p/2 


4 Generalized Birthday Algorithm (GBA) 


4.1 General Principle 


The generalized birthday technique is particularly efficient for solving Syndrome 
Decoding-like problems with a large number of solutions. Suppose one has to 
solve the following problem: 


Problem 2. Given a function f : N+ {0,1}" and an integer a, find a set of 


2% indexes x; such that: 
ë= 


D f(x:) =0. 
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In this problem, f will typically return the z;-th column of a binary matrix H. 
Note that, here, f is defined upon an infinite set, meaning that there are an 
infinity of solutions. To solve this problem, the Generalized Birthday Algorithm 
(GBA) does the following: 


— build 2° lists Lo,..., L2«_1, each containing 277 different vectors f(2;) 
— pairwise merge lists Lə; and L2j+1 to obtain 2071 lists Li of XORs of 2 


vectors f(x;). Only keep XORs of 2 vectors starting with 7 zeros. On 


average, the lists L; will contain 2777 elements. 
— pairwise merge the new lists L4; and L5,,, to obtain 27~? lists LY of XORs 


of 4 vectors f(x;). Only keep XORs of 4 vectors starting with 27> zeros. 


On average, the lists L” will still contain 2777 elements. 

— continue these merges until only 2 lists remain. These 2 lists will be composed 
of 2777 XORs of 2°71 vectors f(2;) starting with (a — 1)—% zeros. 

— as only 27 bits of the previous vectors are non-zero, a simple application of 


the standard birthday technique is enough to obtain 1 solution (on average). 


As all the lists manipulated in this algorithm are of the same size, the com- 
plexity of the algorithm is easy to compute: 2° — 1 merge operations have to 
be performed, each of them requiring to sort a list of size 2777. The complexity 
is thus O(2¢227), For simplicity we will only consider a lower bound of the 
effective complexity of the algorithm: if we denote by L the size of the largest 
list in the algorithm, the complexity is lower-bounded by O(L log L). this gives 


a complexity of O(a aT), 


Minimal Memory Requirements. The minimal memory requirements for 
this algorithm are not as easy to compute. If all the lists are chosen to be of the 
same size (as in the description of the algorithm we give), then it is possible to 
compute the solution by storing at most a lists at a time in memory. This gives 
us a memory complexity of O(a2=), However, the starting lists can also be 
chosen of different sizes so as to store only smaller lists. 

In practice, for each merge operation, only one of the two lists has to be stored 
in memory, the second one can always be computed on the fly. As a consequence, 
looking at the tree of all merge operations (see Fig. J), half the lists of the tree 
can be computed on the fly (the lists in dashed line circles). Let L = 2777 and 
suppose one wants to use the Generalized Birthday Algorithm storing only lists 
of size 4 for a given A. Then, in order to get, on average, a single solution in the 
end, the lists computed on the fly should be larger. For instance, in the example 
of Fig. [Jone should have: 

— |L] = AL, |L4| = A7L, and |L7| = 82, 
— |L4| = L and |L3| = AL, 
= |Ll = L and |Ls] =i, 


In the general case this gives us a time/memory tradeoff when using GBA: 
one can divide the memory complexity by A at the cost of an increase in time 
complexity by a factor \*. However, many other combinations are also possible 
depending on the particular problem one has to deal with. 
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solution 


Fig. 1. Merge operations in the Generalized Birthday Algorithm. All lists in dashed 
line circles can be computed on the fly. 


4.2 GBA under Constraints 


In the previous section, we presented a version of GBA where the number of 
vectors available was unbounded and where the number of vectors to XOR was 
a power of 2. In practice, when using GBA to solve instances of the CSD problem 
only n different r-bit vectors are available and w can be any number. We thus 
consider an idealized version of GBA so as to bound the complexity of “real 
world” GBA. The bounds we give are not always very tight. See for instance [7] 
for the analysis of a running implementation of GBA under realistic constraints. 

If w is not a power of 2, some of the starting lists L; should contain vectors 
f(x;) and others XORs of 2 or more vectors f(a;). We consider that the starting 
lists all contain XORs of 32 vectors f(x;), even if this is not an integer. This 
will give the most time efficient algorithm, but will of course not be usable in 
practice. 

The length of the matrix n limits the size of the starting lists. For GBA to 
find one solution on average, one needs lists L; of size 2747. As the starting lists 
contain XORs of 32 vectors, we need (a) > 277, However, this constraint on 
a is not sufficient: if all the starting lists contain the same vectors, all XORs will 
be found many times and the probability of success will drop. To avoid this, we 
need lists containing different vectors and this can be done by isolating the first 
level of merges. 


— first we select 2°71 distinct vectors s; of a bits such that @ s; = 0. 
— then we pairwise merge lists La; and L2;41 to obtain lists Li containing 
elements having their a first bits equal to s;. 


After this first round, we have 2°~! lists of XORs of 2u vectors such that, if we 
XOR the a first bits of one element from each list we obtain 0. Also, all the lists 
contain only distinct elements, which means we are back in the general case of 
GBA, except we now have 2°~! lists of vectors of length r — a. These lists all 
have a maximum size L = + (ze ) and can be obtained from starting lists L; of 


size 4/ (2%) (see Sect. 2). We get the following constraint on a: 
2u 


1 r>a 
z(a)? a (6 


nN 
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Fig. 2. Logarithm of the complexity of the Generalized Birthday Algorithm for given 
n and r when w varies. (a) with no optimization, (b) when the lists are initialized with 
shortened vectors, and (c) when a is not an integer. 


In practice, after the first level of merges we are not exactly in the general 
case of GBA: if, for example, so ® sı = 52 @ s3, after the second merges, lists Lg 
and L} would contain exactly the same elements. This can be avoided by using 
another set of target values s/ such that @ sj, = 0 for the second level of merges 
(as for the first level) and so on for the subsequent levels of merges (except the 
last two levels). 


Using Non-Integer Values for a. Equation (@) determines the largest pos- 
sible value of a that can be used with GBA. For given n and r, if w varies, 
the complexity of the algorithm will thus have a stair-like shape (see Fig. BJa)). 
The left-most point of each step corresponds to the case where Equation (@) is 
an equality. However, when it is not an equality, it is possible to gain a little: 
instead of choosing values s; of a bits one can use slightly larger values and 
thus start the second level of merge with shorter vectors. This gives a broken- 
line complexity curve (see Fig. 2{b)). This is somehow similar to what Minder 
and Sinclair denote by “extended k-tree algorithm” [19]. In practice, this is al- 
most equivalent to using non-integer values for a (see Fig. 2c)). We will thus 
assume that in GBA, a is a real number, chosen such that Equation (@) is an 
equality. 


Proposition 3. We can lower bound the binary work factor WF apa(n,r,w) of 
GBA applied to solving an instance of CSD with parameters (n, r, w) by: 


WFepa(n,r,w) > a 


roa 1/n r-a 
7 2, with a such that = (ae) =2 a, 
Note that this gives us a bound on the minimal time complexity of GBA but 
does not give any bound on the memory complexity of the algorithm. Also, this 
bound is computed using an idealized version of the algorithm: one should not 
expect to achieve such a complexity in practice, except in some cases where a is 
an integer and w a power of 2. 
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5 Case Studies 


Now that we have given some bounds on the complexities of the best algorithms 
to solve CSD problems, we propose to study what happens when using them to 
attack existing constructions. 

Note that in this section, as in the whole paper, we only consider the resis- 
tance to decoding attacks. Code-based cryptosystems may also be vulnerable 
to structural attacks. However, no efficient structural attack is known for bi- 
nary Goppa codes (McEliece encryption and CFS signature) or for prime order 
random quasi-cyclic codes (FSB hash function). 


5.1 Attacking the McEliece Cryptosystem 


In the McEliece [I7] and Niederreiter BPI] cryptosystems the security relies on two 
different problems: recovering the private key from the public key and decrypting 
an encrypted message. Decrypting consists in finding an error pattern e of weight 
w, such that e x HT = c where H is a binary matrix derived from the public key 
and c is a syndrome derived from the encrypted message one wants to decrypt. 
Here, we suppose that the structural attack consisting in recovering the private 
key is infeasible and can assume that H is a random binary matrix. Decryption 
thus consists in solving an instance of the CSD problem where one knows that 
one and only one solution exists. 

Having a single solution rules out any attempt to use GBA, or at least, any 
attempt to use GBA would consist in using the classical birthday attack. For this 
reasons the best attacks against the McEliece and Niederreiter cryptosystems 
are all based on ISD. Table B] gives the work factors we obtain using our bound 
from Sect. B| For the classical McEliece parameters (10,50) this bound can be 
compared to the work factors computed by non-idealized algorithms. Canteaut 
and Chabaud [I0] obtained a work factor of 264? and Bernstein, Lange and 
Peters [6] a work factor of 28°-5. As one can see, the gap between our bound and 
their complexities is very small indicating two things: 


— our bound on ISD is tight when evaluating the practical security of some 
McEliece parameters, 

— the best ISD-based algorithms are sufficiently advanced to make our assump- 
tion that Gaussian elimination is free almost realistic. Almost no margin is 
left for these techniques to improve and better attacks will need to introduce 
new methods. 


5.2 Attacking the CFS Signature Scheme 


The attack we present here is due to Daniel Bleichenbacher, but was never pub- 
lished. We present what he explained through private communication including 
a few additional details. 

The CFS signature scheme is based on the Niederreiter cryptosystem: 
signing a document requires to hash it into a syndrome and then try to decode 
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Table 3. Work factors for the ISD lower-bound we computed for some typical 
McEliece/Niederreiter parameters. The code has length n = 2” and codimension 
r = mw and corrects w errors. 


optimal p | optimal £ | binary work factor 


4 22 
6 33 
10 54 


this syndrome. However, for a Goppa code correcting w errors, only a fraction 
4 of the syndromes are decodable. Thus, a counter is appended to the message 
and the signer tries successive counter values until one hash is decodable. The 
signature consists of both the error pattern of weight w corresponding to the 
syndrome and the value of the counter giving this syndrome. 

Attacking this construction consists in forging a valid signature for a chosen 
message. One must find a matching counter and error pattern for a given doc- 
ument. This looks a lot like a standard CSD problem instance. However, here 
there is one major difference with the case of McEliece or Niederreiter: instead of 
having one instance to solve, one now needs to solve one instance among many 
instances. One chooses a document and hashes it with many different counters 
to obtain many syndromes: each syndrome corresponds to a different instance. 
It has no importance which instance is solved, each of them can give a valid 
“forged” signature. 

For ISD algorithms, having multiple instances available is of little help, how- 
ever, for GBA, this gives us one additional list. Even though Goppa code param- 
eters are used and an instance has less than a solution on average, this additional 
list makes the application of GBA with a = 2 possible. This will always be an 
“unbalanced” GBA working as follows: 


— first, build 3 lists Lo, Lı, and La of XORs of respectively wo, wı and w2 
columns of H (with w = wo + wi + w2). These lists can have a size up to 
(i) but smaller sizes can be used, 

— merge the two lists Lo and Ly into a list Lo of XORs of wo + w columns of 
H, keeping only those starting with zeros (we will determine the optimal 
choice for À later). Lo contains a ee elements on average. 

— All the following computations are done on the fly and additional lists do 
not have to be stored. Repeat the following steps: 

e choose a counter and compute the corresponding document hash (an 
element of the virtual list La), 

e XOR this hash with all elements of Lə matching on the first A bits (to 
obtain elements of the virtual list L4), 

e look up each of these XORs in Lọ: any complete match gives a valid 
signature. 
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The number L of hashes one will have to compute on average is such that: 


1 n Lin Care = grra 
(wy tan) * Fun) =? ea PRE 


wotwi/ \w2 


The memory requirements for this algorithm correspond to the size of the largest 
list stored. In practice, the first level lists L; can be chosen so that Lọ is always 
the largest, and the memory complexity is Sats ang) The time complexity cor- 
responds to the size of the largest list manipulated: max( 5 (ecm) sL & aa) le 
The optimal choice is always to choose wo = [$], w2 = |$], and wy = 
w — Wo — w2. Then, two different cases can occur: either L} is the largest list, 
or one of Lọ and Ls is. If Li is the largest, we choose À so as to have a smaller 
list Lọ and so a smaller memory complexity. Otherwise, we choose À so that 
Lo and Lg are of the same size to optimize the time complexity. Let 7 be the 
size of the largest list we manipulate and M the size of the largest list we 
store. The algorithm has time complexity O(T log T) and memory complexity 
O(M log M) with: 


if — > /-* then T = — and M = kozia) 

ery oe ({w73)) (weiead) a Caran , 
else T=M= /— —: 
Cig tai) 


This algorithm is realistic in the sense that only integer values are used, 
meaning that effective attacks should have time/memory complexities close to 
those we present in Table Æ Of course, for a real attack, other time/memory 
tradeoffs might be more advantageous, resulting in other choices for À and the w,. 


5.3 Attacking the FSB Hash Function 


FSB [I] is a candidate for the SHA-3 hash competition. The compression func- 
tion of this hash function consists in converting the input into a low weight word 
and then multiplying it by a binary matrix H. This is exactly a syndrome com- 
putation and inverting this compression function requires to solve an instance 


Table 4. Time/memory complexities of Bleichenbacher’s attack against the CFS sig- 
nature scheme. The parameters are Goppa code parameters so r = mw and n = 2”. 


510/9510 900-2 1923.3 563-1 7996.2 967.2 1981-2 981:9 9948 
954.1 954.1 983.3 946.5 966.2 960.0 971.3 971.3 985.6 959.0 
957.2 957.2 986.4 949.6 989.3 964.2 975.4 975.4 989.7 963.1 
960.3 980.3 989.5 952.7 972.4 968.2 979.5 979.5 993.7 967.2 


963.3 983.3 972.5 955.7 975-4 972.3 983.6 983.6 997.8 971.3 
900:2 [geet 915:8 /g08:8 gf8-2 [QT 4 987-8.) 987.8 2101:9 /975:4 
969.5 989.5 978.7 961.9 981.5 980.5 991.7 991.7 9105.9 979.5 
Qi -8 (gi? © Delf 1982.0 294:6 9848 298:9 1922-8 a /g8s:6 


102 M. Finiasz and N. Sendrier 


Table 5. Complexities of the ISD and GBA bounds we propose for the official FSB 


inversion collision 
ISD | GBA || ISD | GBA 


parameters 


FSBie6o 
FSB224 
FSB256 
FSB3s4 
FSBs12 


of the CSD problem. Similarly, finding a collision on the compression function 
requires to find two low weight words having the same syndrome, that is, a word 
of twice the Hamming weight with a null syndrome. In both cases, the security of 
the compression function (and thus of the whole hash function) can be reduced 
to the hardness of solving some instances of the CSD problem. For inversion (or 
second preimage), the instances are of the form CSD(H, w, s) and, for collision, 
of the form CSD(H, 2w, 0). 

Compared to the other code-based cryptosystems we presented, here, the 
number of solutions to these instances is always very large: we are studying a 
compression function, so there are a lot of collisions, and each syndrome has a lot 
of inverses. For this reason, both ISD and GBA based attacks can be used. Which 
of the two approaches is the most efficient depends on the parameters. However, 
for the parameters proposed in [I], ISD is always the best choice for collision 
search and GBA the best choice for inversion (or second preimage). Table 
contains the attack complexities given by our bounds for the proposed FSB 
parameters. As you can see, the complexities obtained with GBA for inversion 
are lower than the standard security claim. Unfortunately this does not give an 
attack on FSB for many reasons: the version of GBA we consider is idealized 
and using non-integer values of a is not practical, but most importantly, the 
input of the compression of FSB is not any word of weight w, but only regular 
words, meaning that the starting lists for GBA will be much smaller in practice, 
yielding a smaller a and higher complexities. 


Conclusion 


In this article we have reviewed the two main families of algorithms for solving 
instances of the CSD problem. For each of these we have discussed possible 
tweaks and described idealized versions of the algorithms covering those tweaks. 
The work factors we computed for these idealized versions are lower bounds 
on the effective work factor of existing real algorithms, but also on the future 
improvements that could be implemented. Solving CSD more efficiently than 
these bounds would require to introduce new techniques, never applied to code- 
based cryptosystems. 
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For these reasons, the bounds we give can be seen as a tool one can use to 


select parameters for code-based cryptosystems. We hope they can help other 
designers choose durable parameters with more ease. 
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A Comments on the Assumptions 


We have assumed the following in Sect. 


(B1) For all pairs (e1,e2) examined in the algorithm, the sums e; + e2 are 
uniformly and independently distributed in Wy, .. 
(B2) The cost of the execution of the algorithm is approximatively equal to 


L- HBA 1) +£- (BA 2) + Ko- t(BA 3), 


where Ko is the cost for testing e, HT = s +eHT given that he(e1 HT) = 
he(s + e2HT) and {(BA i) is the expected number of execution of the 
instruction (BA 7) before we meet the (SUCCESS) condition. 


The first assumption has to do with the way the attacker chooses the sets W 
and W 2. In the version presented at the beginning of Sect. B] they use different 
sets of columns and thus all pairs (e€1, e2) lead to different words e = e1 + e2. 
When W, and W> increase, there is some waste, that is some words e = e1 + €2 
are obtained several times. A clever choice of Wı and W2 may decrease this 
waste, but this seems exceedingly difficult. The “overlapping” approach 
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is easy to implement and behaves (almost) as if Wı and Wz where random (it 
is even sometimes slightly better). The second assumption counts only £ binary 
operations to perform the sum of w/2 columns of £ bits. This can be achieved by 
a proper scheduling of the loops and by keeping partial sums. This was described 
and implemented in [6]. We also neglect the cost of control and memory handling 
instructions. This is certainly optimistic but on modern processors most of those 
costs can be hidden in practice. The present work is meant to give security levels 
rather than a cryptanalysis costs. So we want our estimates to be implementation 
independent as much as possible. 
Similar comments apply to the assumptions (IJ) and (2) of Sect. 


B A Sketch of the Proof of Proposition B] 


We provide here some clues for the proof of Proposition] More details on this 
proof and on the proofs of the other results of this paper can be found in the 
extended version [14]. 


Proof. (of Proposition Pl- Sketch) In one execution of (MAIN LOOP) we exam- 
ine A(T.) distinct value of e1 + e2, where z = (Wiw E3.) and A(z) = 
1 — exp(—z). The probability for one particular element of W;.4¢,) to lead to a 


solution is P 
om 
Cie) 


min ((7,),2") 
Thus the probability for one execution of (MAIN LOOP) to lead to (SUCCESS) is 


P= 


P, =1-(1- Ppols) x 1—exp (- an} where N,(£) = T 


When N,(£) is large (much larger than 1), we have P,(£) ~ A(z)/N,(£) and a 
good estimate for the cost is 


z k+£ 
Np Ga + Wa] + Ky» Le? Cy ? . 


A(z) 


Choosing |W], |W2|, Z and z which minimize this formula leads to the first 
formula of the statement. 

Else we have N (£) < 1 and the expected number of execution of (MAIN 
LOOP) is not much higher than one (obviously it cannot be less). In that case 
we are in a situation very similar to a birthday attack in which the list size is 


L = /1/P = 2/2/4/ Go This gives a cost of 2Llog(Kw-pL) which has to 
be minimized in £, leading to the second formula of the statement. 
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Abstract. In this work, we apply the rebound attack to the AES based 
SHA-3 candidate LANE. The hash function LANE uses a permutation 
based compression function, consisting of a linear message expansion 
and 6 parallel lanes. In the rebound attack on LANE, we apply several 
new techniques to construct a collision for the full compression function 
of LANE-256 and LANE-512. Using a relatively sparse truncated differen- 
tial path, we are able to solve for a valid message expansion and collid- 
ing lanes independently. Additionally, we are able to apply the inbound 
phase more than once by exploiting the degrees of freedom in the parallel 
AES states. This allows us to construct semi-free-start collisions for full 
LANE-256 with 2°° compression function evaluations and 258 memory, 
and for full LANE-512 with 2724 compression function evaluations and 
2128 memory. 


Keywords: SHA-3, LANE, hash function, cryptanalysis, rebound at- 
tack, semi-free-start collision. 


1 Introduction 


In the last few years the cryptanalysis of hash functions has become an important 
topic within the cryptographic community. The attacks on the MD4 family of 
hash functions (MD5, SHA-1) have especially weakened the confidence in the 
security of this design strategy [[3—14]. Many new and interesting hash function 
designs have been proposed as part of the NIST SHA-3 competition [LJ]. The 
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large number of submissions and different design strategies require different and 
improved cryptanalytic techniques as well. 

At FSE 2009, Mendel et al. published the rebound attack [9] - a new technique 
for analysis of hash functions which has been applied first to reduced versions of 
the Whirlpool [2] and Grgst1 H] compression functions. Recently, the rebound 
attack on Whirlpool has been extended in $|, which in some parts is similar to 
our attack. The main idea of the rebound attack is to use the available degrees 
of freedom in the internal state to efficiently fulfill the low probability parts in 
the middle of a differential trail. The straight-forward application of the rebound 
attack to AES based constructions allows a quick and thorough analysis of these 
hash functions. 

In this work, we improve the rebound attack and apply it to the SHA-3 candi- 
date LANE. The hash function LANE |) uses an iterative construction based on 
the Merkle-Damgard design principle and has been first analyzed in [5]. 
The permutation based compression function consists of a linear message ex- 
pansion and 6 parallel lanes. The permutations of each lane are based on the 
round transformations of the AES. In the rebound attack on LANE, we first 
search for differences and values, according to a specific truncated differential 
path. This truncated differential path is constructed such that a collision and 
a valid expanded message can be found with a relatively high probability. By 
using the degrees of freedom in the chaining values, we are able to construct a 
semi-free-start collision for the full versions of LANE-256 with 2°° compression 
function evaluations and memory of 288, and for LANE-512 with 2224 compres- 
sion function evaluations and memory of 2!25. Although these collisions on the 
compression function do not imply an attack on the hash functions, they violate 
the reduction proofs of Merkle and Damgård, and Andreeva [I]. 


2 Description of LANE 


The cryptographic hash function LANE [b] is one of the submissions to the NIST 
SHA-3 competition [LI]. It is an iterated hash function that supports four digest 
sizes (224, 256, 384 and 512 bits) and the use of a salt. Since LANE-224 and 
LANE-256 are rather similar except for truncation, we write LANE-256 whenever 
we refer to both of them. The same holds for LANE-384 and LANE-512. 

The hashing of a message proceeds as follows. First, the initial chaining value 
A_, of size 256 bits for LANE-256, and 512 bits for LANE-512, is set to an initial 
value that depends on the digest size n and the optional salt value S. At the same 
time, the message is padded and split into message blocks M; of length 512 bits 
for LANE-256, and 1024 bits for LANE-512. Then, a compression function f is 
applied iteratively to process message blocks one by one as H; = f(Hi-1, Mi, Ci), 
where C; is a counter that indicates the number of message bits processed so 
far. Finally, after all the message blocks are processed, the final digest is derived 
from the last chaining value, the message length and the salt by an additional 
call to the compression function. 
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2.1 The Compression Function 


The compression function of LANE-256 transforms 256 bits (512 in the case of 
LANE-512) of the chaining value and 512 bits (resp. 1024 bits) of the message 
block into a new chaining value of 256 bits (512 bits). It uses a 64-bit counter 
value C;. For the detailed structure of the compression function we refer to 
the specification of LANE [5]. First, the chaining value and the message block 
are processed by a message expansion that produces an expanded state with 
doubled size. Then, this expanded state is processed in two layers. The first 
layer is composed of six permutation lanes Po,...,P5 in parallel, and the second 
layer of two parallel lanes Qo, Q1. 


2.2 The Message Expansion 


The message expansion of LANE takes a message block M; and a chaining value 
H;—ı and produces the input to six permutations Po,...,P5. In LANE-256, the 
512-bit message block M; is split into four 128-bit blocks mo, m1, m2, m3 and 
the 256-bit chaining value H;_, is split into two 128-bit words ho, hı as fol- 
lows mo||m1i||m2||m3 — Mi, hol|hi — Hi-1. Then, six more 128-bit words 
ao, 41, bo, b1, Co, C1 are computed 


ao = ho Pm Gm P M2 P M3, ay = hi PMD M , 


bo = ho Bhi E mo P M2 O M3, bi = hod Mmi Dm, (1) 


co = ho D hy D mo my m2, C= ho mo m3 . 


Each of these 128-bit values, as in AES, can be seen as 4 x 4 matrix of bytes. 
In the following, we will use the notion xfi, j] when we refer to the byte of the 
matrix x with row index 7 and column index j, starting from 0. 

The values ao||a1, bo||b1, col|c1, Ao||h1, Mo||7m1, mM2||mgz become inputs to the 
six permutations Py,..., Ps described below. The message expansion for larger 
variants of LANE is identical but all the values are doubled in size. 


2.3 The Permutations 


Each permutation lane P; operates on a state that can be seen as a double AES 
state (2 x 128-bits) in the case of LANE-256 or quadruple AES state (4 x 128- 
bits) for LANE-512. The permutation reuses the transformations SubBytes (SB), 
ShiftRows (SR) and MixColumns (MC) of the AES with the only exception, that 
due to the larger state size, they are applied twice or four times in parallel. 
Additionally, there are three new round transformations introduced in LANE. 
AddConstant adds a different value to each column of the lane state and AddCounter 
adds part of the counter C; to the state. Since our attacks do not depend on these 
functions, we skip their details here. The third transformation is SwapColumns 
(SC) - used for mixing parallel AES states. Let x; be a column of a lane state. In 
LANE-256, SwapColumns swaps the two right columns of the left half-state with the 
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two left columns of the right half-state, and in LANE-512, SwapColumns ensures 
that each column of an AES state gets swapped to a different AES state: 


SCo256(xol|x1|| ---|[@7) = xol|x1||xa||x5||x2||x3||r6]|27 
SCs12(x0|læ1|| ---||@15) = Lo||ra||x8||212||21||r5||r9||213]| 


©2||%6||L10||@14||v3||e7||e11||r15 - 


The complete round transformation consists of the sequential application of all 
these transformations in the given order. The last round omits AddConstant and 
AddCounter. Each of the permutations P; consists of six rounds in the case of 
LANE-256 and eight rounds for LANE-512. 

The permutations Qo and Q; are irrelevant to our attack because we will get 
collisions before these permutations. An interested reader can find a detailed 
description of Qo and Q: in B. 


3 The Rebound Attack on LANE 


In this section first we give a short overview of the rebound attack in general 
and then, describe the different phases of the rebound attack on LANE in detail. 


3.1 The Rebound Attack 


The rebound attack was published by Mendel et al. in and is a new tool 
for the cryptanalysis of hash functions. The rebound attack uses truncated dif- 
ferences and is related to the attack by Peyrin on the hash function 
Grindahl [7]. The main idea of the rebound attack is to use the available degrees 
of freedom in the internal state to fulfill the low probability parts in the middle 
of a differential path. It consists of an inbound and subsequent outbound phase. 
The inbound phase is an efficient meet-in-the-middle phase, which exploits the 
available degrees of freedom in the middle of a differential path. In the mostly 
probabilistic outbound phase, the matches of the inbound phase are computed 
backwards and forwards to obtain an attack on the hash or compression function. 
Usually, the inbound phase is repeated many times to generate enough starting 
points for the outbound phase. In the following, we describe the inbound and 
outbound phase of the rebound attack on LANE. 


3.2 Outline of the Rebound Attack on LANE 


Due to the message expansion of LANE, at least 4 lanes are active in a differential 
attack. We will launch a semi-free-start collision attack, and therefore we assume 
the differences in (ho, h1) to be zero. Hence, lane P} is not active and we choose 
P, and thus, (00,61) to be not active as well. The active lanes in our attack 
on LANE are Py, P2, P4 and Ps. The corresponding truncated differential path 
for the P-lanes of LANE-256 is shown in Fig. B} This path is very similar to 
the truncated differential path for LANE-256 shown in the LANE specification 
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Fig. 1. The inbound phase for LANE-256 (left) and LANE-512 (right). Black bytes are 
active, gray bytes fixed by solutions of the inbound phase. 


[Fig. 4.2, page 33], but turned upside-down. The truncated differential path used 
in the attack on LANE-512 is the same as in the LANE specification [Fig. 4.3, 
page 34] and shown in Fig. B] The main idea of these paths is to use differences 
in only one of the parallel AES states for the inbound phases. This allows us 
to use the freedom in the other states to satisfy the outpound phases. Since we 
search for a collision after the P-lanes, we do not need to consider the Q-lanes. 

The main idea of the attack on LANE is that we can apply more than one 
efficient inbound phase by using the degrees of freedom and the relatively slow 
diffusion due to the 2 (or 4) parallel AES states of LANE-256 (or LANE-512). The 
positions of the active bytes of two consecutive inbound phases are chosen such 
that when merging them, the number of the common active bytes of these phases 
is as small as possible. Since we can find many independent solutions for these 
inbound phases, we store them in some lists to be merged. In the outbound 
phase of the attack we merge the results of the inbound phases and further, 
merge the results of all active P-lanes. Note that the merging of two lists can be 
done efficiently. In each merging step, a number of conditions need to be fulfilled 
for the elements of the new list. We merge the lists in a clever order, such that 
we find one colliding pair for the compression function at the end. 

In more detail, we first filter the results of each inbound phase for those 
solutions, which can connect both inbound phases (see Fig. 2). Then, we merge 
the resulting lists of two lanes such that we get a collision after the P-lanes, 
and parts of the message expansion are fulfilled. Finally, we filter the results of 
the left P-lanes (Po, P2) and the right P-lanes (P4, Ps), such that the conditions 
on the whole message expansion are fulfilled. In the attack, we try to keep the 
size of the intermediate results at a reasonable size. We need to ensure, that the 
complexity of generating the lists is below 2”/?, but still get enough solutions in 
each phase to continue with the attack. 
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3.3 The Inbound Phase 


In the rebound attack on LANE, we first apply the inbound phase for a number of 
times. Therefore, we will explain this phase and the corresponding probabilities 
in detail here. In the inbound phase, we search for differences and values conform- 
ing to the truncated differential path for LANE-256 or LANE-512 shown in Fig. D 
with active bytes marked by black bytes. We only describe the application of one 
inbound phase here. In the example of Fig. [I we have 16 active S-boxes between 
state #4 and state #5. It follows from the MDS property of MixColumns, that 
this path has at least one active byte in each of the 4 corresponding columns 
prior to the first, and after the second MixColumns transformation (state #2 and 
state #7). Note that the active bytes in state #2 and state #7 can also be at 
any position marked by gray bytes. 

In the inbound phase, we first choose random differences for the 4 active 
bytes after the second MixColumns transformation (state #7). These differences 
are linearly propagated backward to 16 active bytes at the output of the previous 
SubBytes layer (state #5). Next, we take random differences for the 4 active bytes 
prior to the first MixColumns transformation (state #2) and linearly propagate 
forward to 16 active bytes at the input of SubBytes (state #4). Then, we need 
to find a match for the input and output differences of all 16 active S-boxes. For 
a single S-box, the probability that a random S-box differential exists is about 
one half, which can be verified easily by computing the differential distribution 
table of the AES S-box (see [9] for more details). 

For each matching S-box, we get at least two (in some cases 4) possible byte 
values such that the S-box differential holds. Hence, we get at least 216 possible 
values for one full AES state, such that the differential path for the chosen 
differences in state #2 and state #7 holds. In other words, after trying 216 non- 
zero differences of state #2 and state #7, we get at least 216 solutions for the 
truncated differential path between state #2 and state #7. Hence, the average 
complexity to find one solution for the inbound phase (differences and values) is 
about 1. Note that this holds for both, LANE-256 and LANE-512. 


3.4 The Outbound Phase 


After we have found differences and values for each inbound phase of the active 
lanes, we need to connect these results and propagate them outwards in the 
outbound phase. In backward direction, we need to match the message expansion 
at the input of each lane. In forward direction, we need to match the differences 
of two P-lanes on each side to get a collision. We describe the conditions for 
these two parts according to our truncated differential path in the following. 


The Message Expansion. After the inbound phases, we get values and differ- 
ences at the input and output of the 4 active lanes Po, P2, P4 and Ps. Since we 
have zero differences in (ho, hi) and (bo, b1), we get using the message expansion 
for lane P, (see Equation (I): 


Abo = 0 = Amp 6 Amz 6 Ams, Abı = 0 = Am, 6 Amz 
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Hence, we get the following relation for the message differences in mo, m1, Mo, 
and ms: 


Amı = Ame = Amo } Amg (2) 


Using (I) we get for the differences in the expanded message words (ao, a1) and 
(co, C1): 


Aao = Amı , Aa, = Ams, Aco = Amo, Ac; = Am2 (3) 

and thus, the following relations between ag, a1, Co, and c1: 
Aao = Ac; = Aa; ® Aco (4) 
Beside the differences, we also need to match the values of the message expansion. 


Since we aim for a semi-free-start collision, we can freely choose the chaining 
value (ho, hi) such that the conditions on (ao, a1) are satisfied: 


ho = ao ® Mmo OM E M2 Mg, hh =a,OmM Ome 


That means we have conditions on the input (co, c1) left, which we need to match 
with the message words mg, M1, mz and mg. Since we can vary lanes Po, P2 and 
P,,Ps independently in the following attacks, we can satisfy these conditions by 
merging the results of both sides. Using the equations of the message expansion, 
we get for (co, c1) using the values of (ao, a1): 


co = ao ® a1 P Mmo P M2 D M3, Cy = a0 OHM, O Mə 


We can rearrange these equations in order to have all terms corresponding to 
P),P> on the left side and all terms of P4,P; on the right side: 


mo © m2 @ M3 = co Pao Bar, Mı ® M = c1 Gag (5) 


For merging the two sides, we will compute, store and compare the following 
values of each list: 


vı = co ® ao ® a1, v2 = c1 Ga, v3 = Mo OM. P M3, V4 = Mı OMe 


Colliding P-Lanes. In the forward direction, we need to find a collision for the 
differences in P) and Pz, such that AP) 6 AP, = 0 and for the differences in P4 
and Ps, such that AP, 6 AP; = 0. Note that we can swap the order of the last 
MixColumns with the XOR operation of the P-lanes since both transformations 
are linear. Hence, we only need to match the differences after the last SubBytes 
layer in each of the two active lanes. The blue bytes in Fig. Blof LANE-256, or 
the red, blue and yellow bytes in Fig. B] of LANE-512 are independent of the 
inbound phase. Hence, we can use the freedom in these bytes to find a collision 
after the P-lanes. 
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4 Semi-Free-Start Collision for LANE-256 


In the rebound attack on LANE-256, we construct a semi-free-start collision for 
the full compression function using 2°° compression function evaluations and 
memory requirements of 288, We will use the 6-round truncated differential path 
given in Fig. BJ which is very similar to the one shown in the LANE specification 
[Fig. 4.2, page 33]. We search for a collision after the P-lanes of LANE and use 
the same truncated differential path in the 4 active lanes Po, P2, P4 and Ps. Since 
we do not consider differences in ho and hı, but we fix their values, the result 
will be a semi-free-start collision. The attack on LANE-256 consists basically of 
the following parts: 


1. First Inbound Phase: Apply the inbound phase at the beginning of the 
truncated differential path (state #2 to state #7) for each lane Po, Pz, Pa, 
P; independently. 

2. Second Inbound Phase: Apply the inbound phase in the middle of each 
lane again (state #10 to state #15). 

3. Merge Inbound Phases: Merge the results of the two inbound phases 
(state #7 to state #10). 

4. Merge Lanes: Merge the two neighboring lanes Po,P2 and P4,P; and satisfy 
according differences of the message expansion. 

5. Message Expansion: Merge the two sides (Pp, P2) and (P4, Ps) and satisfy 
the remaining conditions on the message expansion (differences and values). 

6. Find Collisions: Choose remaining free values (neutral bytes) to find a 
collision for each side (Po, P2) and (P4, Ps) independently. 

7. Message Expansion: Merge the two sides (Po, P2) and (P4, Ps) and satisfy 
the conditions on the message expansion of the remaining bytes. 


4.1 First Inbound Phase 


We start the attack on LANE-256 by applying the first inbound phase to each 
of the 4 active lanes Po, P2, P4, Ps independently. In each lane, we start with 5 
active bytes in state #2 and 8 active bytes in state #7 and choose 2°° random 
non-zero differences for these 13 bytes (note that we could choose up to 214 
differences). We propagate backward and forward to 16 active bytes at the input 
(state #4) and output (state #5) of the SubBytes layer in between. We get at 
least 2°° solutions for the inbound phase with a complexity of 29° (see Sect. B3). 
For each result, only the red and black bytes in Fig. P] are determined, i.e. the 
differences as well as the actual values of the bytes are found. Note that we 
have chosen the position of active bytes in state #0, such that at least one term 
of Equation @) or @ is zero for each byte. At this point, we can compute 
backwards to state #0 and independently verify the condition on one byte of 
the input differences: 


Po: Aao[0,0] = Aa; [0, 0], Py: Amo[2,3] = Am,([2, 3] 
P> : Acof2, 3] = Ac; (2, 3] 5 Ps : Amə2[0, 0] = Ams3(0, 0] 
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The condition on each of these bytes is fulfilled with a probability of 278 and we 
store the 288 valid results of each lane P), P2, P4 and P; in the corresponding 
lists Lo, L2, L4 and Ls. Note that we store the values and differences of state 
#10 (red and black bytes) in these lists, since we need to merge these bytes with 
the second inbound phase in the following. For an efficient merging step, the 
lists are stored in hash tables (or sorted) according to the bytes to be merged 
(diffences and values of active bytes in state #10). 


4.2 Second Inbound Phase 


Next, we apply the inbound phase again to match the differences at SubBytes 
between state #12 and state #13. We start with 264 differences in the 8 active 
bytes of state #10 and 23? differences in the 4 active bytes of state #15. Hence, 
we get about 2%° solutions for the second inbound phase with a complexity of 
298. For each result, the gray and black values in Fig. B] between state #7 and 
state #18 are determined. Again, this means we fix the actual values of these 
bytes. The results of the second inbound phase for each lane are stored in lists 
Lo, £5, L} and L4. A node of each lists holds the values and differences of state 
#10 (gray and black bytes). Again, the lists are stored in hash tables (or sorted) 
according to the bytes (black bytes) to be merged. 


4.3 Merge Inbound Phases 


The two previous inbound phases overlap in 8 active bytes (state #7 to state 
#10). We connect the two inbound phases by checking the conditions on the 
overlapping bytes of state #10. Since both values and differences need to match, 
we get a condition on 128 bits. We merge the 288 results of the first inbound 
phase and 2% results of the second inbound phase to get 288 x 296 x 27128 — 256 
differential paths for each lane. A pair connecting both inbound phases is found 
trivially. For each node of the first list (for example Lo), we check the overlapping 
bytes against the values of the second list (LZ). Since the second list is a hash 
table, the effort for producing all 2°° valid pairs is 288 hash table lookups. 

Note that for each pair which satisfies and connects both inbound phases, 
the differences and values between state #0 and state #18 (black, red and gray 
bytes) are determined. We compute and store the 2°° input values and differences 
of state #0 in lists Lo, Le, L4 and Ls. Altough we still do not know half of the 
state, each of these input pairs conforms to the whole truncated differential path 
from state #0 to state #24 with a probability of 1. In other words, we know 
that in state #24, there are at most the given bytes active. 


4.4 Merge Lanes 


Next, we continue with merging the solutions of each lane by considering the 
message expansion. We first combine the inputs of lane P and P by merging 
lists Lo and L2. When merging these lists, we need to satisfy the conditions on 
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the differences of the message expansion. We have conditions on 5 active bytes 
of state #0 in lane P) and P, (see Fig. 2). Remember that we have chosen the 
position of these active bytes, such that at least one term of Equation (B) or @ 
is zero. Hence, we only need to check if two corresponding byte differences are 
equal. Since we have already verified one byte difference (see Sect. ÆI), we have 
4 byte condition left: 


Aao[0, 0] = Ac: [0,0] ; Aa, (0, 1] = Aco[0, 1] (6) 
Aaı[l, 1] = Acofl, 1] ; Aao([2, 3] = Acof2, 3] (7) 


These conditions are fulfilled with a probability of 273? and by merging two lists 
(Lo and Lọ) of size 256, we get 256 x 256 x 2732 = 280 valid matches which we 
store in list Dog. We repeat the same for lane Py and P; by merging lists L4 and 
Ls. We get 280 matches for list L45 as well, since we need to fulfill the 32-bit 
conditions on the differences of the following 4 bytes: 


Am, (0, 0] = Ams(0, 0] ; Amo(0, 1] = Am3(0, 1] (8) 
Amof1, 1] = Ams(1, 1] ; Amo|{2, 3] = Amə/[2, 3] (9) 


Again, if we use hash tables or the previous lists are sorted according to the 
bytes to match, the merge operation can be performed very efficiently. Hence, 
the total complexity to produce the lists Loz and Las is determined by their final 
size and requires an effort of around 2°° computations. 


4.5 Message Expansion 


For all entries of the lists Loz and L45, the values in 32 bytes and differences in 
10 bytes of each of (ao, a1, Co, c1) and (mo, m1, m2, m3) have been fixed (red and 
black bytes in state #0 of Fig. 2). Note that the conditions on the differences of 
each side on its own have already been fulfilled (P) > P> and Py > Ps). Hence, 
if we just fulfill the conditions on the remaining differences between Py e P4, 
then the conditions on P> +> Ps are satisfied as well. Using Equations (2)-@, 
the position of active bytes in Fig. 2J and the already matched differences of 
Sect. £Iland Sect. £4] we only have the following 4 byte conditions left: 


Aao/[0, 0] = Am, (0, 0] ; Aa, (0, 1] == Amo[0, 1] 
Aay(1, 1] = Amof1, 1] ; Aao[2,3] = Amof2, 3] 


Note that we also need to fulfill the conditions on the values of the states. 
Remember that we can freely choose the chaining values (ho, h1) to satisfy the 
values in the first 16 bytes of the message expansion (ao, a1). To fulfill the con- 
ditions on the 16 bytes of (c9,c1) we need to satisfy Equation (B) using the 
corresponding values v1, v2, v3 and v4. Hence, we need to find a match for the 
following values and differences by merging lists Loz and L4s: 
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— 8 bytes of vı from Loz with v3 from L4s, 
— 8 bytes of vg from Loz with v4 from Las, 
— 4 bytes of differences in Loz and in L45. 


Since we have 2°° elements in each list and conditions on 160 bits, we expect to 
find 28° x 280 x 27160 — 1 result. This result satisfies the message expansion for 
all lanes and is a solution for the truncated differential path of each active lane 
between state #0 and state #24. However, we do not get a collision at the end 
of the P-lanes yet, since we do not know the differences of state #24. 


4.6 Find Collisions 


In this phase of the attack, we search for a collision at the end of the P-lanes 
(Po, P2) and (P4, Ps) using the remaining freedom in the second half of the state. 
Note that the 16-byte difference in state #24 is obtained from 8-byte difference 
in state #22 with the linear transforms MixColumns and SwapColumns. Hence, 
the collision space (the 16 bytes where the two lanes differ) has only 264 distinct 
elements. If we take a look at Fig. B| we get for the values in state #7: 


— The black, red and gray bytes represent values which have already been 
determined by the previous parts of the attack. 

— The blue bytes represent values not yet determined and can be used to vary 
the differences in state #22. 


To find a collision between two lanes, we can still choose 2684 values for the blue 
bytes in state #7 of each lane and store these results in lists Lo, Do, L4 and 
Ls. Note that for these 264 values, we get only 23? different values for the two 
free bytes in the first and fifth column of state #18. Hence, we can only iterate 
through 2%? differences in state #22 for each lane. However, this is enough to 
find one colliding difference for each side, since 2°? x 232 x 2764 = 1. By repeating 
this step 2°? times for each side, we expect 264 x 264 x 2764 — 264 results for 
each merged list Loz and L45. 


4.7 Message Expansion 


Finally, we need to match the message expansion for the remaining 32 bytes 
of each side. Hence, we just repeat the same procedure as we did for the first 
half of state #0, except that we only need to match the values of 32 bytes but 
no differences. Again, we can use the remaining bytes of (ho, h1) to fulfill the 
conditions on 16 bytes of (ao,a1). Since, we have 2° solutions in each list Loo 
and L45, we expect to find 264 x 264 x 27128 = 1 colliding pair for (co, c1) and 
thus, a collision for the full compression function of LANE-256. 


4.8 Complexity 


Let us find the complexity of the whole attack. The first inbound phase requires 
2°6 computations and 288 memory, the second inbound requires 2°° computations 
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and 2°° memory, and the merging of the inbound phases requires 288 hash table 
lookups and 2°° memory. Obviously, the second inbound phase and the merge 
inbound phases can be united to lower the memory requirement of these three 
steps. Namely, we create the lists Lo, La, L4 and Ls in the first inbound phase. 
Then, for each differential path of the second inbound phase, instead of storing 
it in a list, we immediately check if it can be merged with some differential from 
the lists. Only if it can be merged, we do the outbound phase and compute state 
#0. Hence, the first three steps of our attack require around 2°° computations 
and 288 memory. The merge lanes step requires 2°° computations and memory. 
The message expansion steps require 28° computations, while the find collisions 
steps require 23? computations. Hence, the total attack complexity is around 
296 computations and 288 memory. Note that the cost of each computation is 
never greater than the cost of one compression function evaluation. Therefore, 
the complexity to find a semi-free-start collision for all 6 rounds of LANE-256 is 
about 29° compression function evaluations and 25° memory. 


5 Semi-Free-Start Collision for LANE-512 


In the rebound attack on LANE-512, we construct a semi-free-start collision for 
the full, 8-round compression function using 2724 compression function evalu- 
ations and memory requirements of 21?8, We use the same iterative truncated 
differential path as shown in the specification of LANE-512 [Fig. 4.3, page 34], 
which is given in Fig. B] Similar to the attack on LANE-256, we search for a 
collision after the P-lanes and use the same truncated differential path in the 4 
active lanes Po, Po, Py and P;. The attack on LANE-512 consists basically of the 
following parts: 


1. First Inbound Phase: Apply the inbound phase at the beginning of the 
truncated differential path (state #2 to state #7) for each lane Po, P2, Pa, 
P; independently. 

2. Merge Lanes: Merge the two neighboring lanes Po,P2 and P4,P; and satisfy 

according differences of the message expansion. 

3. Message Expansion: Merge the two sides (Po, P2) and (P4, Ps) and satisfy 

the remaining conditions on the message expansion (differences and values). 

4. Second Inbound Phase: Apply the inbound phase in the middle of each 

lane again (state #10 to state #15). 

5. Merge Inbound Phases: Merge the results of the two inbound phases. 

6. Starting Points: Choose random values for the brown bytes in state #7 to 

get enough starting points for the subsequent phases. 

7. Merge Lanes: Merge the values of the starting points for the two neigh- 

boring lanes Po,P2 and P4,P; and satisfy the according differences of the 

message expansion. 

8. Message Expansion: Merge the two sides (Po, P2) and (P4, Ps) and satisfy 
the remaining conditions on the message expansion (differences and values) 
for the starting points. 
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9. Third Inbound Phase: Apply the inbound phase at the end of each lane 

for a third time (state #18 to state #23). 

10. Merge Inbound Phases: Merge the results of the three inbound phases 
and use the remaining freedom in between. 

11. Find Collisions: Merge the corresponding two lanes to find a collision for 
each side (Po, P2) and (P4, P5) independently. 

12. Message Expansion: Merge the two sides (Pp, P2) and (P4, Ps) and satisfy 
the conditions on the message expansion of the remaining bytes. 


5.1 First Inbound Phase 


We start the attack on LANE-512 by applying the first inbound phase to each 
of the 4 active lanes Po, P2, P4, Ps independently. In each lane, we start with 8 
active bytes in state #2 and 4 active bytes in state #7 and choose 284 random 
non-zero differences for these 12 bytes (note that we could choose up to 2%° 
differences). We propagate backward and forward to 16 active bytes at the input 
(state #4) and output (state #5) of the SubBytes layer in between. We get at 
least 284 matches for the inbound phase with a complexity of 25 (see Sect. B3). 
For each result, the gray and black bytes in Fig. BJ are determined. Hence, we 
can already verify the condition on one byte of the input differences for each 
lane by computing backwards to state #0: 


Po : Aao[2, 2] = Aa, [2,2], Po : Aao[2, 6] = Aa, [2, 6] 
Pz: Aco[l, 1] = Aci [1,1], Pz: Aco[1,5] = Aci[1, 5] 
Py: Amo[1,1] = Amı [1,1], Py: Amo[1, 5] = Am, [1,5] 
Ps : Amə2[2, 2] = Ams3/2, 2] , P; : Am2[2, 6] = Ams|[2, 6] 


The conditions on each of the lanes are fulfilled with a probability of 2716 and we 
store the 268 valid matches of each lane Po, P2, P4 and P; in the corresponding 
lists Lo, La, La and Ls. 


5.2 Merge Lanes 


Next, we continue with merging the solutions of each lane by considering the 
message expansion. We first combine the results of lane Py and P3 by merging 
lists Lo and L2. When merging these lists, we need to satisfy the conditions on 
the differences of the message expansion for the following 6 bytes: 


Since this match is fulfilled with a probability of 2748 and we merge two lists 
of size 268, we get 268 x 268 x 2748 — 288 valid matches which we store in Log. 
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10; 


19; 


20; 


24: 


Fig. 3. The truncated differential path for 8 rounds of LANE-512. Lane Po shows the 
plain truncated differential path, lane P2 other possible truncated differential paths 
and lane P, and Ps are used to describe the attack. 
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We repeat the same for lane P; and Ps merge lists L4 and Ls. We get 288 matches 
for list L45, since we need to fulfill conditions on differences of 6 bytes as well: 


Amo|(0, 0] = Ams3(0, 0] 5 Amo|(0, 4] = Ams3(0, 4] 
Amo{1,1] = Amg[1, 1], Amo{1, 5] = Amg/1, 5] 
Am, (2, 2] = Ams(2, ] ; Am, (2, 6] = Amə[2, 6] 


5.3 Message Expansion 


For all entries of lists Loz and L45, the values in 32 bytes and differences in 16 
bytes of each of (ao, a1, Co, c1) and (Mmo, M1, M2, M3) have been fixed (gray and 
black bytes in state #0 of Fig. B). Since the conditions on the differences of each 
side on its own have already been fulfilled, we just need to match the conditions 
on the remaining 6-byte differences between each side (Po, P2) and (P4, Ps): 


^a [0,0] = Amo[0,0], Aaz[0,4] = Amo[0, 4] 
Aao[1, 1] = Amo[1, 1], Aao[1, 5] = Amo[1, 5] 
Aao/[2, 2] = Amı[2, 2] ; Aao[2, 6] = Amı[2, 6] 


Remember that we can freely choose the chaining values (ho, h1) to satisfy the 
values in the first 16 bytes of the message expansion (ao, a1). To fulfill the condi- 
tions on the 16 bytes of (co, c1) we need to find matches for the following values 
and differences using lists Loz and L4s: 


— 8 bytes of vı from Loz with v3 from L4s, 
— 8 bytes of vg from Loz with v4 from Las, 
— 6 bytes of differences in Loz and in L45. 


Since we have 288 elements in each list and conditions on 176 bits, we expect to 
find 288 x 288 x 2-176 = 1 result. This result satisfies the message expansion for 
all lanes and is a solution for the truncated differential path of each active lane 
between state #0 and state #10. 


5.4 Second Inbound Phase 


Next, we apply the inbound phase again to match the differences at SubBytes 
between state #12 and state #13. After the first inbound phase, the values of 
16 bytes in state #10 (black and gray bytes), and the difference in 16 bytes (1st 
AES-block) of state #12 (black bytes) have already been fixed. Hence we can 
start with 2°? possible 4-byte differences in state #15, compute backwards to 
state #13 and need to match the differences in the SubBytes layer. We expect 
to find at least 23? solutions for the second inbound phase (see Sect. B3B). 


5.5 Merge Inbound Phases 


The result of the second inbound phase are 232 values for the 16 bytes in state 


#10 (green and black bytes). From the first inbound phase, we have obtained 


122 K. Matusiewicz et al. 


one solution for 16 bytes in state #10 (gray and black bytes) as well. In these 
16 bytes, the values of the 4 active bytes (black) overlap between both inbound 
phases and the probability for a successful match is 2732. Among the 232 results 
of the second inbound phase, we expect to find one solution to match the values 
of state #10. Once we have found a match, we can compute the values of the 
newly determined 12 bytes in state #7, marked by green bytes in Fig. B] 


5.6 Starting Points 


In this phase of the attack, we will compute a number of starting points which 
we will need for the subsequent steps. For each lane, we choose random values 
for the 12 bytes in state #7 (marked by brown bytes in Fig. B) and compute 
the corresponding 16-byte values in state #0. We repeat this step 264 times and 
store the results in the corresponding lists Lp, £4, L4 or Lt. 


5.7 Merge Lanes 


Next, we merge lists L} and L4 to get the list Lho, consisting of 21?8 values for 
8 0 2 t08 02 8 


the 32 newly determined bytes of (mo, m1, M2, m3) (brown bytes of state #0 in 
lane Pp and Pz). Further, we merge lists L4 and L% to get the list L4, of size 
2128 containing the 32 byte values of (ao, a1, Co, c1). 


5.8 Message Expansion 


Finally, we satisfy the conditions of the message expansion on (ao, a1) using the 
values of (ho, hı), and use the two lists Lo, and L4s to satisfy the conditions on 
(co, €1). Since we need to match 16 bytes of (co,c1) and have 218 elements in 
both lists, we expect 21?8 x 2128 x 27128 — 2128 matching pairs which we store 
in list Ls. We will use these values in a later phase of the attack. 


5.9 Third Inbound Phase 


Now, we extend the truncated differential path by applying a third inbound 
phase between state #18 and state #23 for each active lane. Note that the 
values in 16 bytes of state #18 (black and green bytes), and the differences in 16 
bytes (1st AES-block) of state #20 (black bytes) have already been fixed due to 
the second inbound phase. Similar to the second inbound phase, we start with 
232 4-byte differences in state #23 and compute backwards to state #21 to get 
a match for the SubBytes layer. Since we have 23? starting differences, we expect 
to find 2°? results for the third inbound phase, with fixed values and differences 
for the 16 bytes in state #15 (purple and black bytes). 


5.10 Merge Inbound Phases 


The values of the second and the third inbound phase overlap in 4 active bytes 
(black) of state #18. Since we have 23? results of the third inbound phase, we 
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expect to find one solution after merging the two phases. Once we have found 
a match, we can compute the values of the newly determined 12 bytes in state 
#15, marked by purple bytes in Fig. Next, we need to connect all three 
inbound phases. For all possible 8-byte values of state #10 marked by red bytes, 
we compute the 16 corresponding bytes in state #15 (2nd AES-block). If the 
computed values satisfy the 4 bytes in state #15 marked by purple, we store 
the result of each lane in the corresponding lists Lg, L5, LY and LZ. In total, 
we obtain 264 . 2732 = 232 entries in each list. We repeat the same for the bytes 
marked by blue and yellow, and generate the lists L? and L$ for each of the 
active lanes with index i € {0,2,4,5}. For each lane, we merge the three lists 
L?, L? and LF and store the 2°° results in lists L*. Note that for each entry in 
these lists, we can determine all values and differences of the corresponding lane. 


5.11 Find Collisions 


In this phase of the attack, we finally search for a collision at the end of the 
P-lanes (Po, P2) and (P4, P5) using the elements of lists L¥. To find a collision at 
the end of the P-lanes, we need to match the 16 byte differences in state #32 of 
the two corresponding active lanes such that A( P) P2) = 0 and A(P,@ Ps) = 0. 
Note that we can satisfy these conditions independently for each side (Po, P2) 
and (P4, Ps). Since we need to match 128 bits and we have 2° elements in each 
list L*, we expect to find 29° . 29° . 27128 = 264 collisions for each side. We store 
the corresponding inputs (ao, a1, Co, c1) for the collisions between lane P) and 
P» in list L$, and the inputs (mo, m1, m2, m3) for the collisions between lane P4 
and Ps in list Ls. 


5.12 Message Expansion 


Finally, we need to match the message expansion for the remaining 32 bytes 
of each side. Hence, we just repeat the same procedure as we did for the first 
part of state #0, except that we only need to match the values of 32 bytes 
but no differences. Again, we use the values of (ho, h1) to satisfy the conditions 
on (ao,@1) first. Then, we match the values of the 32 bytes in (co,ci). Since 
we only have 264 entries in both of Li, and Läs, the success probability for a 
match is 264 . 264 . 2-256 — 2-128. However, we can still repeat from Sect. 
using a different starting point stored in list Ls. Since we have 21?8 elements in 
list Ls, we can repeat the previous steps up to 2128 times. Hence, we expect to 
find one valid match for the message expansion and thus, a collision for the full 
compression function of LANE-512. 


5.13 Complexity 


The total complexity of the rebound attack on LANE-512 is determined by 
the merging step after the third inbound phase. This step has a complexity 
of 298 compression function evaluations and is repeated 2128 times. The memory 
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requirements are determined by the largest lists, which are Lo, and L/,; (or Ls) 
with a size of 2128, Hence, the total complexity to find a semi-free-start collision 
for LANE-512 is about 2128 . 296 = 2??4 compression function evaluations and 
2128 in memory. 


6 Conclusion 


In this work, we have applied the rebound attack to the hash function LANE. 
In the attack we use a truncated differential path with differences concentrating 
mostly in one part of the lanes. Due to the relatively slow diffusion of parallel 
AES rounds, we are therefore able to solve parts of the lanes independently. 
First, we search for differences and values (for parts of the state) according to 
the truncated differential path and also satisfy the message expansion. Then, we 
choose values which can be changed such that the truncated differential path and 
according message expansion still holds. The freedom in these values is then used 
to search for a collision at the end of the lanes without violating the differential 
path or message expansion. 

In the rebound attack on LANE, we are able to construct semi-free-start col- 
lisions for full round LANE-224 and LANE-256 with 2°° compression function 
evaluations and memory of 28°, and for full round LANE-512 with complexity of 
2224 compression function evaluations and memory of 2128, Although these colli- 
sions on the compression function do not imply an attack on the hash functions, 
they violate the reduction proofs of Merkle and Damgard, or Andreeva in the 
case of LANE. However, due to the limited degrees of freedom, a collision attack 
on the hash function seems to be difficult for full round LANE. 
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Abstract. Whirlpool is a hash function based on a block cipher that 
can be seen as a scaled up variant of the AES. The main difference is the 
(compared to AES) extremely conservative key schedule. In this work, 
we present a distinguishing attack on the full compression function of 
Whirlpool. We obtain this result by improving the rebound attack on 
reduced Whirlpool with two new techniques. First, the inbound phase of 
the rebound attack is extended by up to two rounds using the available 
degrees of freedom of the key schedule. This results in a near-collision 
attack on 9.5 rounds of the compression function of Whirlpool with a 
complexity of 2'”° and negligible memory requirements. Second, we show 
how to turn this near-collision attack into a distinguishing attack for the 
full 10 round compression function of Whirlpool. This is the first result 
on the full Whirlpool compression function. 


Keywords: hash functions, cryptanalysis, near-collision, distinguisher. 


1 Introduction 


In the last few years the cryptanalysis of hash functions has become an important 
topic within the cryptographic community. Especially the collision attacks on the 
MD4 family of hash functions (MD4, MD5, SHA-1) have weakened the security 
assumptions of these commonly used hash functions (OI7T 7242524). Still, most 
of the existing cryptanalytic work has been published for this particular family 
of hash functions. Therefore, the analysis of alternative hash functions is of great 
interest. In this article, we will present a security analysis of the Whirlpool hash 
function with respect to collision resistance. 

Whirlpool is the only hash function standardized by ISO/IEC 10118-3:2004 
(since 2000) that does not follow the MD4 design strategy. Furthermore, it has 
been evaluated and approved by NESSIE [20]. Whirlpool is commonly considered 
to be a conservative block-cipher based design with an extremely conservative 
key schedule and follows the wide-trail design strategy [45]. Since its proposal 
in 2000, only a few results have been published. 
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Table 1. Summary of results for Whirlpool. Complexities are given in compression 
function evaluations, a memory unit refers to a state (512 bits). The complexities in 
brackets refer to modified attacks using a precomputed table taking 275 time/memory 
to set up. 


lexit 
target rounds Pomp wes type source 
runtime/memory 
block cipher W | 8 | 2mo aitinguisber [ Koudsen M 


hash function /2 collision 

hash function ; near-collision | Mendel et al. 
compression function : collision FSE 2009 
compression function : near-collision 


hash function ; collision Appendix [A| 

hash function : {284-8 near-collision | Appendix A] 
compression function 3 QP 19° (2 '29 70728) collision Sect. JJ 
compression function : 2176 /98 (24? /2*?8) | near-collision | Sect. JJ 
compression function 2188 /28 (2121 /2'°8) | distinguisher | Sect. 


Related Work. At FSE 2009, Mendel et al. proposed a new technique for 
the analysis of hash functions: the rebound attack [I6]. It can be applied to both 
block cipher based and permutation based constructions. The idea of the rebound 
attack is to divide an attack into two phases, an inbound and an outbound phase. 
In the inbound phase, degrees of freedom are used, such that in the outbound 
phase several rounds can be bypassed in both forward- and backwards direction. 
This led to successful attacks on round-reduced Whirlpool for up to 7.5 (out of 
10) rounds. The results are summarized in Table] 

For the block cipher W that is implicitly used in the Whirlpool compression 
function, Knudsen described an integral distinguisher for 6 out of 10 rounds [LJ]. 
Furthermore, it is assumed that this property may extend also to 7 rounds. Note 
that in [2] similar techniques were used to obtain known-key distinguishers for 
7-rounds of the AES. 


Our Contribution. The main contribution of this paper is a distinguishing 
attack on the full compression function of Whirlpool which is achieved by im- 
proving upon the work of Mendel et al. in in several ways. 

We start with a description of the hash function Whirlpool. Then, in Sect. B] 
we give an overview of the rebound attack and show how it is applied to reduced 
versions of Whirlpool. In Sect. H] we describe our improvement of the rebound 
attack on Whirlpool in detail. This technique enables us to add two rounds in 
the inbound phase of the attack and thus gives a collision and near-collision 
attack on the Whirlpool compression function reduced to 7.5 and 9.5 rounds, 
respectively. Based on this, we describe in Sect. Bla new generic attack and show 
how to distinguish the full (all 10 rounds) compression function of Whirlpool 
from a random function by turning the near-collision attack for 9.5 rounds into 
a distinguishing attack for 10 rounds. To the best of our knowledge this is the 
first result on the full Whirlpool compression function. Table [] summarizes the 
previous results on Whirlpool as well as the contributions of this paper. 
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2 Description of Whirlpool 


Whirlpool is a cryptographic hash function designed by Barreto and Rijmen in 
2000 [I]. It is an iterative hash function based on the Merkle-Damgard design 
principle (cf. [[8]). It processes 512-bit message blocks and produces a 512-bit 
hash value. If the message length is not a multiple of 512, an unambiguous 
padding method is applied. For the description of the padding method we refer 
to []J. Let M = M,||Mg|| --- || Mz be a t-block message (after padding). The hash 
value h = H(M) is computed as follows: 


Ho = IV (1) 
H; = W(ĦH;-1ı, M;) & Hj- M; ford0<j<t (2) 
h= H; (3) 


where IV is a predefined initial value and W is a 512 bit block cipher used in 
the Miyaguchi-Preneel mode [I8]. The block cipher W used by Whirlpool is very 
similar to the Advanced Encryption Standard (AES) [9]. 

The state update transformation and the key schedule update an 8 x 8 state 
S and K of 64 bytes in 10 rounds. In one round, the state is updated by the 
round transformation r; as follows: 


ri = AKoMRoSCoSB. 
The round transformations are briefly described here: 


— the non-linear layer SubBytes (SB) applies an S-Box to each byte of the state 
independently. 

— the cyclical permutation ShiftColumns (SC) rotates the bytes of column j 
downwards by 7 positions. 

— the linear diffusion layer MixRows (MR) is a right-multiplication by the 8 x 8 
circulant MDS matrix cir(1,1,4,1,8,5,2,9). 

— the key addition AddRoundKey (AK) adds the round key K; to the 8 x 8 state, 
and AddConstant (AC) adds the round constant C; to the 8 x 8 state of the 
key schedule. 


After the last round of the state update transformation, the initial value or 
previous chaining value H;_1, the message block Mj, and the output value of 
the last round are combined (xored), resulting in the output of one iteration. A 
detailed description of the hash function is given in [J]. 

We denote the resulting state of round transformation r; by S; and the in- 
termediate states after SubBytes by 99B, after ShiftColumns by S?° and af- 
ter MixRows by SMR, The initial state prior to the first round is denoted by 
So = M; ® Ko. The same notation is used for the key schedule with round keys 
K; with Ko = Hja 


3 The Rebound Attack 


The rebound attack is a new tool for the cryptanalysis of hash functions and 
was published by Mendel et al. in [I6]. It is a differential attack. The main 
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idea is to use the available degrees of freedom in a collision attack to efficiently 
fulfill the low probability parts in the middle of a differential trail. The rebound 
attack consists of an inbound phase with a meet-in-the-middle part in order to 
exploit the available degrees of freedom, and a subsequent probabilistic outbound 
phase. AES based hash functions are a natural target for this attack, since their 
construction principle allows a simple application of the idea. 


3.1 Basic Attack Strategy 


In the rebound attack, the compression function, internal block cipher or permu- 
tation of a hash function is split into three sub-parts. Let W be a block cipher, 
then W = W pw fe) Win O Wow- 


inbound 


Fig. 1. A schematic view of the rebound attack. The attack consists of an inbound and 
two outbound phases. 


The rebound attack can be described by two phases (see Fig. [): 


— Inbound phase: Is a meet-in-the-middle phase in W;,,, which is aided by 
the degrees of freedom that are available to a hash function cryptanalyst. 
This very efficient combination of meet-in-the-middle techniques with the 
exploitation of available degrees of freedom is called the match-in-the- 
middle approach. 

— Outbound phase: In the second phase, the matches of the inbound phase 
are computed in both forward- and backward direction through Wf, and 
Wy to obtain desired collisions or near-collisions. If the differential trail 
through Wry, and Wy has a low probability, one has to repeat the inbound 
phase to obtain more starting points for the outbound phase. 


3.2 Preliminaries for the Rebound Attack on Whirlpool 


In the following, we want to briefly summarize some well known facts that will 
be frequently used in the subsequent sections. 


— Truncated differentials: Knudsen proposed truncated differentials as a 
tool in block cipher cryptanalysis. In a standard differential attack (cf. Bl), 
the full difference between two inputs/outputs is considered whereas in the 
case of truncated differentials, the differences is only partially determined, 
i.e. for every byte, we only check if there is a difference or not. A byte having 
a non-zero difference is called active. 
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— Difference Propagation in MixRows: Since the MixRows operation is a linear 
transformation, standard differences propagate through MixRows in a deter- 
ministic way whereas truncated differences behave in a probabilistic way. 
The MDS property of the MixRows transformation ensures that the sum of 
the number of active input and output bytes is at least 9 (cf. [IJ). In general, 
the probability of any x — y transition with 1 < x,y < 8 satisfying x+y > 9 
is approximately 2\V—®)'8. For a detailed description of the propagation of 
truncated differences in MixRows we refer to [6], see also BI]. 

— Differential Properties of SubBytes: Let a,b € {0,1}%. For the Whirlpool 
S-box, we are interested in the number of solutions to the equation 


S(x) D S(x @ a) = b. (4) 


Exhaustively counting over all 216 differentials shows that the number of 
solutions to (@) can only be 0,2,4,6,8 and 256, which occur with frequency 
39655, 20018, 5043, 740,79 and 1, respectively. The task to return all solu- 
tions x to @) for a given differential (a,b) is best solved by setting up a 
precomputed table of size 256 x 256 which stores the solutions (if there are 
any) for each (a,b). 

However, it is easy to see that for any permutation S (to be more precise, 
for any injective map) the expected number of solutions to @) is always 
1. We get that 27°) >, #{x| S(a @ a) @ S(x) = b} = 2-28 = 1, 
because for a fixed a, every solution x belongs to a unique b. Since the inputs 
to all the S-boxes are independent, the same reasoning is valid for the full 
SubBytes transformation. 


3.3 Application to Round-Reduced Whirlpool 


In this section, we will briefly describe the application of the rebound attack 
to the hash function Whirlpool. A detailed description of the attack is given 
in [6]. For a good understanding of our results, it is recommended to study 
these previous results on Whirlpool very carefully. 

The rebound attack on Whirlpool is a differential attack which uses a differ- 
ential trail with the minimum number of active S-boxes according to the wide 
trail design strategy. The core of the rebound attack on Whirlpool is a 4 round 
differential trail, where the fully active state is placed in the middle: 


1% 8 7 6473851 


In the rebound attack, one first splits the block cipher W into three sub-ciphers 
W = Wrw? Wino Wow, such that the most expensive part of the differential trail 
is covered by the inbound phase W;n. In the inbound phase, the available degrees 
of freedom (in terms of actual values of the state) are used to guarantee that 
the differential trail in Win holds. The differential trail in the outbound phase 
(Wiw, Wow) is supposed to have a relatively high probability. While standard 
XOR differences are used in the inbound phase, truncated differentials are used 
in the outbound phase of the attack. 
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outbound phase inbound phase outbound phase 


Fig. 2. A schematic view of the rebound attack on 4 rounds of Whirlpool with round 
key inputs. Black state bytes are active. 


In the following, we briefly describe the inbound and outbound phase of the 
rebound attack on 4 rounds of Whirlpool. For a more detailed description, we 
refer to the original paper [IG]. 


Inbound Phase. In the first step of the inbound phase, we choose a random 
difference with 8 active bytes at the input of MixRows of round r2 (S3°). Note 
that we need an active byte in each row of the state (see Fig. B) to get a fully 
active state after the MixRows transformation. Since AddRoundKey does not 
change the difference, we get a fully active state at the input of SubBytes of 
round r3 (S2). Then, we start with another difference in 8 active bytes at the 
output of MixRows of round r3 (SMR) and propagate backwards. Again, since 
we have an active byte in each row, we get a fully active state at the output of 
SubBytes of round r3. 

In the second step of the inbound phase, the match-in-the-middle step, we 
look for a matching input/output difference of the SubBytes layer of round r3. 
This is done as described in Sect. B2] with a precomputed 256 x 256 lookup 
table. Note that we can repeat the inbound phase at most about 2!° times. As 
indicated in Sect. BJJ we expect one solution per trial, that is, we can produce 
at most 21?8 actual values that follow the differential trail in the inbound phase. 


Outbound Phase. In contrast to the inbound phase, we use truncated dif- 
ferentials in the outbound phase of the attack. By propagating the matching 
differences and state values through the next SubBytes layer outwards, we get a 
truncated differential in 8 active bytes in both backward and forward direction. 
These truncated differentials need to propagate from 8 to 1 active byte through 
the MixRows transformation, both in the backward and forward direction (see 
Fig. 2). The propagation of truncated differentials through the MixRows trans- 
formation can be modelled in a probabilistic way, see Sect. B2] Since we need 
to fulfill one 8 — 1 transitions in the backward and forward direction, the prob- 
ability of the outbound phase is 27256 = 27112. In other words, we have to 
repeat the inbound phase about 211? times to generate 2!!? starting points for 
the outbound phase of the attack. 


3.4 Previous Results on Round-Reduced Whirlpool 


Extending the 4 round trail in both, the inbound and outbound phase, leads 
to attacks on round reduced Whirlpool for up to 7.5 (out of 10) rounds (where 
0.5 rounds consist only of SubBytes and ShiftColumns). To be more precise, by 
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extending the outbound phase of the attack by 0.5 and 2.5 rounds, one can con- 
struct a collision and near-collision for the Whirlpool hash function reduced to 
4.5 and 6.5 rounds, respectively. The collision attack has a complexity of about 
2120 and the near-collision attack has a complexity of about 2'°°. Furthermore, 
by additionally extending the inbound phase of the attack by 1 round, one can 
find a collision and a near-collision for the compression function of Whirlpool 
reduced to 5.5 and 7.5 rounds with a complexity of 2120 and 2128, respectively. 
Note that adding this round in the inbound phase is possible, since in a com- 
pression function attack, one can use the degrees of freedom of the key schedule 
(chaining value) to guarantee that the trail in the inbound phase holds. All re- 
sults are summarized in Table [J and for more details on these results we refer 


to [0]. 


4 Improved Rebound Attack on the Whirlpool 
Compression Function 


In this section, we improve the inbound phase of the original rebound attack on 
Whirlpool. By using a new differential trail and extensively using the available 
degrees of freedom of the key schedule, we can add 2 additional rounds to the 
inbound phase of the attacks. The basic idea is to have two instead of one inbound 
phase (match-in-the-middle step) and connect them using the available degrees 
of freedom from the key schedule. The outbound phase of the attacks is identical 
as in the previous attacks on 5.5 and 7.5 rounds for the compression function of 
Whirlpool. As a result, we obtain a collision and a near-collision attack for the 
compression function of Whirlpool reduced 7.5 and 9.5 rounds, respectively. 


4.1 Inbound Phase 


In this section, we describe the improved inbound phase of the attack in detail. 
We use the following sequence of active bytes: 


8 2 64-448 S54 64 Se 8 


In order to find inputs following the differential of the inbound phase, we split 
it into two parts. In the first part, we apply the match-in-the-middle step with 
active bytes 8 — 64 — 8 twice in rounds 1-2 and 4-5. In the second part, we 
need to connect the resulting 8 active bytes and 64 (byte) values of the state 
between round 2 and 4 using the degrees of freedom we have in the choice of the 
round key values (see Fig. B). 


Inbound Part 1. In this part of the inbound phase, we apply the match-in-the- 
middle step twice for rounds 1-2 and 4-5 (see Fig. B), which can be summarized 
as follows: 


1. Precomputation: For the S-box, compute a 256 x 256 lookup table as 
described in Sect. BQ] 
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———— — 
inbound part 1 pee inbound part 1 
inbound part 2 


Fig. 3. The inbound phase of the attack 


2. Match-in-the-middle (rounds 1-2): 
(a) Start with 8 active bytes at the output of AddRoundKey in round r2 (S2) 
and propagate backward to the output of SubBytes in round rz (89B). 
(b) Start with 8 active bytes at the input of MixRows in round rı (S?°) 
and propagate forward to the input of SubBytes in round r2 (S1). Note 
that we can compute forward and solve the following step for each row 
independently. 
Connect the input and output of the S-boxes of round rz by choosing 
the actual values of the state 91, respectively S$B, using the lookup 
table generated in the precomputation step. After repeating step (b) 
for each row about 2° times we expect to find a match for the 8 S- 
boxes and thus 2° actual values (see Sect. 2). Since we do this for all 
rows independently, we get about 264 actual values for the full state S4, 
respectively 93B, such that the trail holds. 
3. Match-in-the-middle (rounds 4-5): Do the same as in Step 2. 


Ae 
z] 
eer 


Hence, we get 264 candidates for 93B and 264 candidates for S4 after the first 
part of the inbound phase of the attack with a complexity of about 2° round 
transformations. 


Inbound Part 2. In the second part of the inbound phase, we have to connect 
the 8 active bytes (64 (bit) conditions) as well as the actual values (512 (bit) 
conditions) of S$B and S4 by choosing the subkeys K2, K3 and K4 accordingly. 
Therefore, we have to solve the following equation: 


MR(SC(SB(MR(SC(SB(MR(SC(53®)) @ K2))) 8 K3)))@K4=S4 (5) 
with 
K = MR(SC(SB(K2))) ® C3 


K4 = MR(SC(SB(K3))) © C4. (6) 


Since we have 264 candidates for S$B, 264 candidates for S4 and 251? candidates 
for the 3 subkeys Ko, K3, K4 (because of (@)), we expect to find 264 solutions. 
Since SMR = MR(SC(S$B)), we can rewrite the above equation as follows: 


MR(SC(SB(MR(SC(SB(SMR @ K2))) @ K3))) @ Ka = S4 (7) 


Note that one can always change the order of SC and SB in the Whirlpool 
block cipher without affecting the output of one round. In order to make the 
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subsequent description of the attack easier, we do this here and get the following 
equation. 


MR(SC(SB(MR(SB(SC(S!" © K2))) @ K3))) @ Ka = S4 (8) 


Furthermore, MR and SC are linear transformations and hence we can rewrite 
the above equation as follows: 


SB(MR(SB(S3 @ K3)) 6 Ks) @ KP =X (9) 


with S$ = SC(SMR), K3 = SC(K2), KI = SB(K3), X =SC"*(MR *(S4@C,)). 
In the following, this equivalent description is used to connect the values and 
differences of the two states SMR and S4. 


K; Ky KE KYR G Ky KE KE 


SBH MR} Ò 


3B) 
T3 T3 g 


T4 


D SB} {MR} © 5B} noe 
T3 T3 Tt 


Fig. 4. The second part of the inbound phase. Black state bytes are active. 


Remember that the two 8-byte differences of S3 and X have already been 
fixed due to the previous steps. Furthermore, we can choose from 264 values for 
each of the states S$} and X. Now, we use equation (O) to determine the subkey 
Kž such that we get a solution for the inbound phase of the attack. Note that 
we can solve (Q) for each row of the equation independently (see Fig. Æ. It can 
be summarized as follows. 


1. Compute the 8-byte difference and the 2° values of the state $3 from S$B, 
and compute the 8-byte difference and the 264 values of the state X from 
S4. Note that we can compute and store the values of S3 and X row-by-row 
and independently. Hence, both the complexity and memory requirements 
for this step are 2° instead of 2%. 

2. Repeat the following steps for all 264 values of the first row of S2 to get 264 
matches for S3 to S4: 

(a) For the chosen value of the first row of S2, forward compute the differ- 
ences and values to the first row of S3. 

(b) Choose the first row of the key Ks such that the differential of the S-box 
between $3 and $3® holds. 

(c) Compute the first row of K3, S3, KJB and X. Since we have 264 values 
for the first row of S$ and 2% values for the first row of X, we expect to 
find a match on both sides. In other words, we have now connected the 
values and differences of the first row. 

(d) Next, we connect the values of rows 2-8 independently by a simple brute- 
force search over all 264 corresponding key values of K%. Since we have 
to connect 64 bit values and we test 2° key values we expect to always 
find a solution. 
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In total, we get 2654 matches connecting state Sž to state X with a complexity of 
2128 and memory requirements of 28. In other words, with the values of Sž, X and 
the corresponding key K3, we get 264 starting points for the outbound phase of the 
attack. Hence, the average complexity to find one starting point for the outbound 
phase is 264. It is important to note that one can construct a total of 21°? starting 
points in the inbound phase to be used in the outbound phase of the attack. 
Note that step 2 (d) can be implemented using a precomputed lookup table 
of size 2!°8. In this lookup table each row of the key K (64 bits) is saved for the 
corresponding two rows of Sž and X (64 bits each). Using this lookup table, we 
can find one starting point for the outbound phase with an average complexity 
of 1. However, the complexity to generate this lookup table is 2128. 


4.2 Outbound Phase 


In the outbound phase of the attack, we further extend the differential path 
backward and forward. By propagating the matching differences and state values 
through the next SubBytes layer, we get a truncated differential in 8 active bytes 
for each direction. These truncated differentials need to follow a specific active 
byte pattern to result in a collision on 7.5 rounds and a near-collision on 9.5 
rounds, respectively. In the following, we will describe the outbound phase for 
the collision and near-collision attack in detail. 


Collision for 7.5 Rounds. By adding 1 round in the beginning and 1.5 rounds 
at the end of the trail, we get a collision for 7.5 rounds for the compression 
function of Whirlpool. In the attack, we use the following sequence of active 
bytes: 

1 Ti 8 Tay 64 13%, 8 Tiy 8 AN 64 re, 8 Uhr 1 m5. 1 


As described in Sect. BQ] the propagation of truncated differentials through 
the MixRows transformation is modelled in a probabilistic way. For the differ- 
ential trail to hold, we need that the truncated differentials in the outbound 
phase propagate from 8 to 1 active byte through the MixRows transformation, 
both in the backward and forward direction (see Fig. 5). Since the transition 
from 8 active bytes to 1 active byte through the MixRows transformation has a 
probability of about 2756, the probability of this part of the outbound phase is 
27256 — 27112. Furthermore, to construct a collision at the output (after the 
feed-forward), the exact value of the input and output difference has to match. 
Since only one byte is active (see Fig. 5), this can be fulfilled with a probability 


Mı o S; Se H, 


= = | a = | = = 
SB z SB 
ER- fgl {inbound phase | fael ($B) 
Ua SHEE] e O ] 
Ty T7 


outbound phase outbound phase 


Fig. 5. Differential trail for collision attack on 7.5 rounds 
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of 278. Hence, the probability of the outbound phase is 27112 . 278 = 27120, In 
other, words, we have to generate 21? starting points (for the outbound phase) 
in the inbound phase of the attack to find a collision for the compression function 
of Whirlpool reduced to 7.5 rounds. 

Since we can find one starting point with an average complexity of about 
and memory requirements of 2°, we can find a collision with a complexity of 
about 2120+64 — 2184. The complexity of the attack can be further improved 
on the cost of higher memory requirements. By using a lookup table with 21?8 
entries (generated in a precomputation step), we can find one starting point for 
the inbound phase with an average complexity of 1. In other words, we can find 
a collision for the compression function reduced to 7.5 rounds with a complexity 
of about 2120. However, the precomputation step (constructing the lookup table) 
has a complexity of about 218, 


964 


Near-Collision for 9.5 Rounds. The collision attack on 7.5 rounds for the 
compression function can be further extended by adding one round at the begin- 
ning and one round at the end of the trail in the outbound phase. The result is 
a near-collision attack on 9.5 rounds for the compression function of Whirlpool 
with the following sequence of active bytes: 


8 BS) rg T3 64 Ta g 5, g T6, 64 TR oi 9, g ToS g 


Since the 1-byte difference at the beginning and end of the 7.5 round trail will 
always result in 8 active bytes after one MixRows transformation (see Sect. B2), 
we can go backward 1 round and forward 1 round with no additional cost. 
Using the feed-forward, the position of two active S-boxes match and cancel 
each other with a probability of 2718. Hence, we get a collision in 50 and 52 
bytes for the compression function of Whirlpool with a complexity of about 2176 
and 2176+16 — 9192 respectively. With a precomputation step with complexity 
of 21?8 and similar memory requirement, one can find a near-collision for the 
compression function of Whirlpool with a complexity of about 211? (collision in 
50 bytes) and 2'? (collision in 52 bytes), respectively. 


M, È S lý s, e s, s le s \"° s, see " 
fa! a fa! = m 

{AK}-| f | g] ( inbound phase |] 

(MR) tal ( mms J 

f 2 


outbound phase outbound phase 


Fig. 6. In the attack on 9.5 rounds we extend the trail one more round at the beginning 
and at the end of the outbound phase to get a near-collision of Whirlpool 


5 A Subspace Distinguisher for 10 Rounds 


In this section, we present the first cryptanalytic result on the full Whirlpool 
compression function. The method for extending the previous result on 9.5 
rounds is extended to full 10 rounds of the compression function by defining 
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a different attack scenario. Instead of aiming for a near-collision, we are in- 
terested in distinguishing the Whirlpool compression function from a random 
function. For this, we will introduce a new kind of distinguishing attack, a so 
called subspace distinguisher. In the following, F2 = GF(2) always denotes the 
finite field of order 2. 

For the subspace distinguishing attack, we consider the following problem: 


Problem 1. Given a function f mapping to EY, try to find t input pairs such 
that the corresponding output differences belong to a vector space of dimension 
at most n for somen < N. 


Remark. We define Problem [Min this generic way in order to make it more 
generally applicable. This will be shown in the extended version of this paper. 


5.1 Solving Problem [for the Whirlpool Compression Function 


In this section, we show how the compression function attack described in Sect. 
can be used to distinguish the full Whirlpool compression function from a ran- 
dom function. 

Obviously, the difference between two Whirlpool states can be seen as a vector 
in the vector space of dimension N = 512 over F2. The crucial observation is 
that the attack of Sect. H] can be interpreted as an algorithm that can find t 
difference vectors in F3!? (output differences of the compression function) that 
form a vector space of dimension n < 128. 

To see this, observe that by extending the differential trail from 9.5 to 10 
rounds, the 8 active bytes in oe will always result in a fully active state Sio 
due to the properties of the MixRows transformation. Thus the near-collision is 
destroyed. However, if we look again at Fig. Q] the differences in M, and the 
differences in S?$ can be seen as (difference) vectors belonging to subspaces of 
F31? of dimension at most 64. 

Even though after the application of MixRows and AddRoundKey the state S10 
is fully active in terms of truncated differentials, the differences in $j still belong 
to a subspace of F31? of dimension at most 64 due to the properties of MixRows. 
Therefore, after the feed-forward, we can conclude that the differences at the out- 
put of the compression function form a subspace of F3'? of dimension n < 128. 

Hence, we can use the attack of Sect. H to find t difference vectors forming a 
vector space of dimension n < 128 with a complexity of t- 2176 or t- 212 using 
a precomputation step with complexity 2!2°. Note that t < 2192-112 — 280 due 
to the remaining degrees of freedom in the inbound phase of the attack. 

Now the main question is for which values of t our attack is more efficient 
than a generic attack. In other words, how do we have to choose t such that we 
can distinguish the compression function of Whirlpool from a random function. 
Therefore, we first have to bound the complexity of the generic attack. This is 
described in the next section. 


5.2 Solving Problem [] for a Random Function 


Remarks on the Security Model. In order to discuss generic attack sce- 
narios, we will have to choose a security model. We will adopt the black box 
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model introduced by Shannon [23]. In this model, a block cipher can be seen 
as a family of functions parameterized by the secret key k € K, that is, E : 
{0,1}/4l x {0,13}% = {0,1}, where for each k € K, Ep is seen as a uniformly 
chosen random permutation on {0,1}%. 

In B] it was shown, that an ideal block cipher based hash function in the 
Miyaguchi-Preneel mode is collision resistant and non-invertible. Based on this, 
we model our compression function f as black box oracle to which only forward 
queries are admissible. We also want to note that in all of the following, when 
we are talking about complexity, we are talking about query complexity. Note 
that the practical complexity is always greater or equal to the query complexity. 


The Generic Approach. In this generic approach the only property used 
about f is the fact that the outputs of f are contained in the vector space FY. 

Let us now assume that an adversary is making Q queries to the function f. 
Assuming that Q « 20/2, we thus get K = (2) differences (€ FẸ) coming from 
these Q queries. For given n and t >> n, we now want to calculate the probability 
that among these K difference vectors, we have t vectors that span a space of 
dimension less or equal to n. 

We will need the following fact about matrices over finite fields. Let E(t, N, d) 
denote the number of t x N matrices over Fə that have rank equal to d. Then, 
it is well known (cf. R or [[3J) that 


d—1 d-1 9N _ 9i). (ot _ 9% 
E(t, N,d) = |] (2% - 2%). (5) A (10) 


2d — 2 
i=0 i=0 


where (is denotes the g-binomial coefficient with q = 2. 


Proposition 1. Let n,t,N E€ N be given such that t > N > n. We assume a 
set of K vectors chosen uniformly at random from FẸ . Let Pr(K,t, N,n) denote 
the probability that t of these K vectors span a space of dimension not larger 
than n. Then, we have 


K EN n 
Pr(K,t, N,n) < HE XC E(t,.N, d) (11) 
d=1 
1 (Ke \' 
< TS g—(N-n)(t-n)—(n—1) 12 
< = ( £) (12) 


Proof. Based on the definition of E(t, N, d), it is easy to see that (I) is an upper 
bound for Pr(K, t, N, 7). 

Computing the second bound consists of two steps. Bounding the binomial 
coefficient and bounding the rest. We get 


ONS E(t, N,d) < 25N 22: E(t, N,n) (13) 
d=1 


(14) 


P g-t N+1 (= _ 2%) : (35 _ za) 
= ji 


Qn = Qn— 
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<2 A Cg 1, gt-(n-1) , 9N-(n yy" (15) 


= g= (t-n)(N-n)-(n—1), (16) 


These inequalities are based on two facts. First, it is easy to show that for 
t> N >n, we have E(t, N, n) < X3; E(t, N,d) < 2- E(t, N,n). This can be 
proven by using induction over n and elementary properties of the q-binomial 
coefficient. Second, (L) follows from the fact that the function defined by f(a) = 
(2t — 2)(2% — x)/(2” — z) is strictly increasing on the interval x € (0, 2n=1], 

For the binomial coefficient (%¥) we combine the simple estimate (*) < K*/t! 
with the following inequality based on Stirling’s formula BA: 


1 J 1 1 
Vnt Ze TTR <t! < Vant*t 3e tTa (17) 


1 


From this we get (5) < Vant G 


t 


j and with (6), this proves the proposition. m 


As a corollary, we can give a lower bound for the number of random vectors 
needed to fulfill the conditions of the proposition with a certain probability. 


Corollary 1. For a given probability p and given N,n,t as in Proposition I 
the number K of random vectors needed to contain t vectors spanning a space of 
dimension not larger than n with a probability p is lower bounded by 


+ (N—n)(t—n)+(n—1) 
) an a (18) 


1 
K>- (pv 2rt 


€ 


and the number of queries Q to f needed to produce t vectors spanning a space 
of dimension not larger than n with a probability p is lower bounded by 


A vig Pay 


Proof. Equation (I8) follows immediately from (2) and (19) follows from setting 
K = ($) =Q(Q-1)/2 in (9. m 


5.3 Complexity of the Distinguishing Attack 


Table P] shows the complexities of the generic approach and our dedicated ap- 
proach for several values of t. As can be seen in the table, one can distinguish the 
full Whirlpool compression function from random with a complexity of about 
2188 with t = 21? (or 21?1 with t = 2° using a precomputation table). In other 
words, when performing 2188 queries to a random function (TJ) shows that the 
probability for solving Problem[[Jfor t = 21? is « 1. To the best of our knowledge 
this is the first result on the full Whirlpool compression function. 
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Table 2. Values for t, Q (query complexity), C (complexity of our attack), and 
Cp (complexity of our attack with precomputation) for p = 1,N = 512,n = 128 


logat )fios2(Q)| Jflog(C )llog (Ce floes Œ) flog (Q lloga (O) foga (C) 
ae a 13 a a 
ic 186 | 122 || 14 28| 190 | 126 
15 53| 191 | 127 
16 40| 192 | 128 


: 187 123 
188 124 


6 Conclusion 


In this paper, we have proposed a new kind of distinguishing attack for cryptanal- 
ysis of hash functions. We have successfully attacked the Whirlpool compression 
function. To the best of our knowledge this is the first attack on full Whirlpool. 

We have obtained this result by improving the rebound attack on reduced 
Whirlpool. First, the inbound phase of the rebound attack was extended by up 
to two rounds using the available degrees of freedom from the key schedule. This 
resulted in a near-collision attack on 9.5 rounds of the compression function 
of Whirlpool. Second, we have shown how to turn this rebound near-collision 
attack into a distinguishing attack for the full 10 round compression function of 
Whirlpool. 

The idea seems applicable to a wider range of hash function constructions. 
In particular, the attacks described in this paper can be applied to the hash 
function Maelstrom [$| in a straight forward manner because of the similarity to 
Whirlpool (see also [I6]). Several SHA-3 candidates are a natural target for this 
new kind of attack, see for instance [[4§15]. Furthermore, subspace distinguishers 
can be applied to block ciphers as well. This will be discussed in an extended 
version of this paper. 
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A Attacks on the Hash Function 


In this section, we present a collision and near-collision for the Whirlpool hash 
function. The attacks are a straight forward extension of the collision and near- 
collision attack on 4.5 and 6.5 rounds of Whirlpool presented in [6]. By adding 
one round in the inbound phase we can find a collision and a near-collision for 
Whirlpool reduced to 5.5 and 7.5 rounds, respectively. The core of the attack is a 
5 round differential trail, where two fully active states are placed in the middle: 


18 7 64 64-5 8 31 


Since the outbound phase of the attacks is identical to the previous attacks (see 
Sect. Æ), we only discuss the inbound phase of the attack here (see Fig. I. 


Fig. 7. The inbound phase of the collision attack and near-collision attack on the hash 
function 


It can be summarized as follows. 


1. Precomputation: For the S-box, compute a 256 x 256 lookup table as de- 
scribed in Sect. 

2. Start with 8 active bytes (differences) at the input of MixRows in round r2 
(95°) and propagate forward to the input of SubBytes in round r3 (S2). 

3. Start with 8 active bytes at the output of MixRows in round r4 (SMR) and 
propagate backward to the output of SubBytes in round r4 (93B). 

4. Next we have to connect the states S2 and $}® such that the differential trail 
holds. In other words, we have to find the actual values for S2 such that: 


SB(MR(SC(SB(S2))) © K3) © SB(MR(SC(SB(S2 @ 41))) @ K3) = Ay 


where A, denotes the active bytes (differences) in S2 and Az denotes the 
active bytes (differences) in 93°. In the following, we will show how this 
equation can be solved with a complexity of about 264 by solving the equation 
for sets of 8 bytes independently. It can be summarized as follows. 
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(a) For all 264 values of $2[0, 0], S2[1, 7], . - - , S2[7, 1] compute the first row of 
S3® and check if the above equation holds. Note that due to ShiftColumns, 
these bytes are shifted to the first row of $$° and MixRows works on each 
row independently. In other words, we get 2°* candidates for each row 
of S38. Hence, after testing all 264 candidates for the first row of 93B we 
expect to find a match for the first row of Ao. 

(b) Do the same for the corresponding 8 bytes for row 2-8 of SẸ. 

After testing each set of 8 bytes independently, we will find a state S2 such 

that the differential trail is connected. Finishing this step of the attack has 

a complexity of about 8 - 264 MixRows (~ 2°4 round computations). 


Hence, we can compute one starting point for the outbound phase with a com- 
plexity of about 264, Note that the complexity of the inbound phase can be sig- 
nificantly reduced at the cost of higher memory requirements. By saving 2647s 
candidates for $}® in a list, we can do a standard time/memory tradeoff with a 
complexity of about 2120+s and memory requirements of 26+~*. By setting s = 0 
we can find 264 starting points with a complexity of 264 and similar memory 
requirements of 264, 

Hence, we can find a collision for Whirlpool reduced to 5.5 rounds with a 
complexity of about 2!2° and a near-collision for 7.5 rounds in 50 (respectively 
52) bytes with a complexity of about 2!%° and 21? (respectively 2178). All attacks 
have memory requirements of 2°. 
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Abstract. We consider a long standing problem in cryptanalysis: at- 
tacks on hash function combiners. In this paper, we propose the first 
attack that allows collision attacks on combiners with a runtime below 
the birthday-bound of the smaller compression function. This answers 
an open question by Joux posed in 2004. 

As a concrete example we give such an attack on combiners with the 
widely used hash function MD5. The cryptanalytic technique we use 
combines a partial birthday phase with a differential inside-out tech- 
nique, and may be of independent interest. This potentially reduces the 
effort for a collision attack on a combiner like MD5||SHA-1 for the first 
time. 


Keywords: hash functions, cryptanalysis, MD5, combiner, differential. 


1 Introduction 


The recent spur of cryptanalytic results on popular hash functions like MD5 
and SHA-1 suggests that they are (much) weaker than originally an- 
ticipated, especially with respect to collision resistance. It seems non-trivial to 
propose a concrete hash function which inspires long term confidence. Even more 
so as we seem unable to construct collision resistant primitives from potentially 
simpler primitives [27]. Hence constructions that allow to hedge bets, like con- 
catenated combiners, are of great interest. Before we give a preview of our results 
in the following, we will first review work on combiners. 


Review of work on combiners. The goal of combiners is to have at least some 
bound on the expected security even if (some of the) hash functions get broken, 
for various definitions of “security” and “broken”. Joux [I2] showed (by using 
multi-collisions) that the collision resistance of a combiner can not be expected 
to be much higher than the birthday bound of the component (=hash function) 
with the largest output size. 

On the other hand, combiners seem to be very robust when it comes to collision 
security up to the birthday bound (of the component with the smallest output 
size): By using techniques similar to Coron et al. [B], Hoch and Shamir [LJ] 
showed that only very mild assumptions on a compression function are needed to 
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achieve a collision resistance of at least ews 2). In fact, using a model proposed 
by Liskov [[5], they show that none of the compression functions need to be 
collision, nor preimage resistant in the usual sense. 


Motivation: cryptanalysis of combiners. Concatenating the output of hash 
function is often used by implementors to “hedge bets” on hash functions. A 
combiner of the form MD5||SHA-1 as used in SSL 3.0/TLS 1.0 and TLS 1.1 
is an example of such a strategy. Let’s assume we are given a combiner of the 
form MD5||SHA-1. Let’s further assume that a breakthrough in cryptanalysis of 
SHA-1 brings down the complexity of a collision search attack to 2°. We know 
that the best collision search attacks on MD5 are as fast as 2° [29]. So what is 
the best collision attack on the combiner? The best known method due to Joux 
is only as good as a birthday attack on the smaller of the two hash functions in 
the combiner. There is no known method which would allow to reduce the total 
effort below this bound, i.e. 2°: 

Currently, the best solution at our disposal is to combine the (hypothetic) 
SHA-1 attack with Joux’s multicollision approach. Find a 2°+-multicollision for 
SHA-1 with effort 25? . 64 = 258, and then perform a birthday-type search in 
this 264 collision to single out a collision which also collides for MD5. The total 
effort will be 2%. In fact, reductions of the effort for SHA-1 collision search will 
only marginally improve the attack on the combiner. How to improve upon this? 
Analyzing the combiner as a whole may by prohibitively complicated. The resis- 
tance of two-pipe designs with sufficiently different pipes like RIPEMD-160 
against recent collision search attacks also gives hints in this direction. 


Preview of our results: We propose a new method that allows a cryptanalyst 
to focus on the hash functions individually while still potentially allowing attacks 
on combiners with a runtime below the birthday-bound of the smaller compres- 
sion function. This also answers an open question by Joux posed in 2004 [J]. 
For this, we start with definitions in Section B} In Section B] we give a high-level 
description of our attack strategy on a concatenation combiner without going 
into the details of a particular compression function. Next, we consider as a 
concrete cryptanalytic example combiners that use MD5. We first give an alter- 
native description of MD5 in Section Æ which will turn out to be beneficial (and 
in fact as our experiments suggest necessary) in Section [$] where we describe 
the cryptanalytic techniques we need, to be able to use the high-level attack 
description. 

For the cryptanalysis, we employ a combination of a birthday-style attack and 
a differential inside-out technique that uses different parts of a collision charac- 
teristic at different stages of an attack, both before and after a birthday phase. 
The differential technique may be of independent interest, also for improving 
known types of collision attacks on MD5, or for finding one-block collisions. In 
Section B] we give practical results which allow us to estimate the actual secu- 
rity MD5 is able to give in a combiner. Finally, we conclude and discuss open 
problem in Section [J 
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2 Definitions 


In the reminder of the paper we give a few definitions. We give a classification 
of collision attacks on compression functions and hash functions. Let an iterated 
hash function F be built by iterating a compression function f : {0,1} x 
{0,1}” — {0,1}” as follows: 


— Split the message M of arbitrary length into k blocks x; of size m. 
— Set ho to a pre-specified IV 

— Compute Va; : h; = f(hi-1, zi) 

— Output F(M) = hk 


Classification for compression function collision attacks. Higher numbers mean 
less degrees of freedom for an attacker and are hence more difficult to obtain 
cryptanalytically. 


— Compression collision attacks of type 0 
Compute hi—1, hš 1, Mm; and mj s. t. f(hi-1,mi) = f(hi_,,m*). Note that 
early attacks by den Boer and Bosselaers [I], and Dobbertin [9] on MD5 are 
of this type. 
— Compression collision attacks of type 1 
Given hj_1, compute m; and m* s. t. f(Ri-1, mi) = f(hž_1; mž). 
— Compression collision attacks of type 2 
Given h;—ı and h*_,, compute m; and m* s. t. f(hi-1,mi) = f(hž_1; mž) 
— Compression collision attacks of type 3 
Given h;_; and h#_,, compute m; s. t. f(hi-1, mi) = f(hž_1, Mmi) 


Later in the paper, it will be useful to have a weakened version of the collision 
attack on the compression function of type 3. 


— Compression collision attacks of type 3w 
Given h;_; and h#_, from an efficiently enumerable subset s (of size |s| = 
2”-*) of all 2?” possible pairs (hi_1,h*_,), compute m; s. t. f(hi-1, mi) = 
FC ais mi). 


Complementing types 1-3 of the compression function attacks, one may define 
similar attack settings for the hash function as well. For sake of concreteness, we 
also give examples related to MD5. 


— Hash collision attacks of type 1: Given mo, compute mı and mj such 
that F(mo||m1) = F(mo||mj). This is the most simple way to violate the 
collision resistance of a hash function. For MD5, see Wang et al. [BI]. The 
prefix mo may be the string of length 0, or any other message block. 

— Hash collision attacks of type 2: Given mo and mg, compute mı and 
mï such that F'\(mo||mi) = F(m$||mj). This type of attack is much more 
demanding from a cryptanalytic view as it needs to cope with arbitrary 
prefixes and hence arbitrary chaining input differences (Stevens et al. [28)). 
In turn it allows much more powerful attacks, as can be seen by the recent 
attacks on certificate authorities using MD5 [29]. 
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— Hash collision attacks of type 3 (new, in this paper): Given mo and 
ms, compute mı such that F(mo||m1) = F(mý||mı). This type of attack 
is in turn much more difficult than type 2, as it halves the degrees of free- 
dom available to an attacker. The message difference is fixed (to zero), this 
means that for each MD5 compression function, instead of 1024 degrees of 
freedom, only 512 degrees of freedom via the message input are available to 
an attacker. 


This leads us to the informal definition of a weak hash function, complementing 
the concept of a weak compression function from [I5]. A weak hash function 
may be modeled as a random oracle, but offers additionally oracles that allow 
collision attacks on the hash function of type 1 and type 2, but not of type 3. 
The purpose of this introduction of a weak hash function is to show that MD5 
can not even meet the requirements of a weak hash function, even though no 
type 3 collision attack on the MD5 compression function are known. 

We may define the security of a hash function as a component in a con- 
catenated combiner against collision attacks (concatenated combiner collision 
security, or simply C security) of an n-bit hash function as the effort to find 
a collision attack of type 3. For MD5, despite all cryptanalytic advances in re- 
cent years, this is 264. In this paper, we show an attack suggesting that the C? 
security of MD5 is less. 


3 Outline of Attack Strategies 


In the following we assume it is possible to devise collision attacks of type 3w 
on the compression function below the birthday bound. These collision attacks 
will need a suitable differential path, and a method to find message pair which 
conforms to such a differential path. We will discuss this problem for the case of 
MD5 in Section [p} This alone is not enough for our attack to work, but based 
on such a result we propose to continue as follows. We first show how to devise 
a collision attack of type 3 on a hash function using a combination of birthday 
techniques and differential shortcut techniques. Then we continue and apply such 
an attack on a combiner. 


3.1 Collision Attack of Type 3 


The attack we propose (see Fig. [for an illustration) consists of three phases. A 
preparation phase that computes target differences (1), a birthday phase (using 
Mı) (2) and a differential phase (using M2) that performs a type 3w collision 
attack (3), and is executed in this order. 

Before the birthday phase (2), the differential phase needs to be “prepared” 
as follows (1). We generate a number of 2” distinct characteristics (also called 
paths) through the compression function on a heap with the following property: 
no message difference, an arbitrary input difference (62), and no output difference 
(62 H 63 = 0). Let’s assume each of them, when given a suitable chaining input 
pair, results in an effort of 2” (or less) to find a conforming message pair. Let 24 
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Fig. 1. Outline of attack strategy 


be the cost of this path generation in terms of equivalent compression function 
computations. Let’s further assume that each of these paths has an average 
number of z independent conditions on the chaining input (CI). 

A single path with z conditions on the CI in fact can be used for 2”77 possible 
pairs of CIs. Since there exist 2?” pairs, 2”t# randomly generated pairs would 
be needed before one matches the CI described by the path (4; matches 62, and 
the conditions are fulfilled). Using birthday techniques, this is expected to take 
2(n+2)/2 time. Given all 2” paths, only 2"+*-* randomly generated pairs are 
needed, which in turn is expected to take 2(°-*+*)/? time. Hence, if x > z, the 
runtime is expected to be below the birthday bound. 

For obtaining a single hash collision of type 3, the overall method may be 
seen as a successful cryptanalytic attack, if the sum of the runtimes for the path 
generation, the birthday phase, and the work to find a conforming message pair 
using a particular path is below the birthday bound, i.e. if 2” + 20=#+2)/2 + 
2” < 2”"/2, For obtaining many hash collisions of type 3, the effort to generate 


the heap of paths (1) may be negligible, hence to goal would be reduced to 
9(n—-2+z)/2 4 QW < gn/2. 


3.2 Attack on the Combiner F,(M)||F2(M) 


We now discuss how to use a type 3 collision attack on a hash to devise an 
attack on a combiner of two hash functions using it, where the first of two hash 
functions suffers from a type 1 collision attack. 

The setting: Let Fı(-) and F2(-) be two hash functions with output size nı 
and ng. For the sake of simplicity we assume in the following that nı = ng = n. 
Let’s further assume that Fı suffers from a type 1 collision attack, i.e. given 
mo, let the effort to find a mı and mj such that F\(mo||m1) = Fi(mo||mj7) be 
2°: < 2”/2, Furthermore, assume that F, suffers from a type 3 collision attack, 
i.e. given m2 and m, compute mg such that F(m2||m3) = F2(m5||m3) be 
2° < 2”"/?. In more detail, as noted above, 2"+*-* randomly generated pairs 
(m2, m3) are needed. The introduced symbols are summarized in Table [] 

We are now ready to formulate the new collision attack on the combiner 
F\(M)||F2(M) that combines both attacks. It is also illustrated in Fig. 
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Table 1. Symbols used in the description of the attack 


symbolldescription 


n output size 

w |loge of the cost of finding a conforming message pair 

x loge of the number of distinct characteristics 

y  |loge of the cost of the preparatory path generation 

z number of conditions on the chaining input 

cı loge of the cost of type 1 collision attack on the first hash function 
c2 |loge of the cost of type 3 collision attack on the second hash function 
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(a) The known approach due to Joux (b) New collision construction using 
does not allow to exploit shortcut type 3 collisions allows to exploit 
collision attacks on both hash func- shortcuts attacks in both hash func- 
tions. The lower bound is hence a tions without considering the inter- 
birthday attack on the “smaller” action in the cryptanalysis. 


hash function. 


Fig. 2. Comparison of collision attack on a combiner 


. Let mo be the string of size zero and perform the type 1 collision attack on 
F, and obtain a (m},m}*) such that Fı (mt) = Fi(m}*). Note that Fo(mt) 
does not collide with F2(m}*). 

. Repeat the step above while replacing mp with the concatenation of all 
previously found messages (n + z — x)/2 — 1 times. This means, for the i-th 
step (for i = 2...(n+z—2)/2), let mo = m}||...||m and obtain a (m+ ,m**) 
such that Fi (mi) = Fi(mi*). 

. Note that by using Joux’s multicollision method, we have produced a 
g(n+2—2)/2 collision for F}. 

. Perform the type 3 attack of F> as follows. For the birthday-part of the type 
3 attacks, use the (n+z-x)/2 collisions in F} to obtain the required 2°**—* 
pairs of prefixes mz and m5. 

. Continue with the differential shortcut part of the type 3 attack as outlined 
in the previous subsection, i.e. find a suffix mg such that there is a collision 
between 

F,(m}||m3|| -mft ||ms) 
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and i 
Fo(m* |m ]| -m(t |). 
6. Also, the collision in F, remains. 
n+z—a2)/2 
F,(mi}||m3l] .-. |lm{"*7-?? || ms) 
collides with 
* n+z—2)/2)* 
Foara h "| ms), 


as after the multicollision the message block mg without a difference is added. 
7. As the same message constitutes a collision for both Fy and F>, this in turn 
results in a collision for the combiner. 


The computational complexity of this procedure is as follows. The type 1 collision 
search on F; in step 1 is repeated (n+ z — x)/2 times, which sums up to an effort 
of (n+ z — x)/2 - 2%. Afterwards the type 3 collision search in F> is performed 
using the obtained multicollision. This consists of a birthday part and a type 3w 
compression function attack, in total costing 2° computations. Hence, the total 
complexity is (n + z — x) - 2%71 + 2°, and reusing the calculation for c3 from 
Section B] we arrive at 


(n+ z— g). 2571 4 2 Or ere 4 ow, (1) 


4 Alternative Description of MD5 


MD5 is an iterative hash function based on the Merkle-Damgard design princi- 
ple MITI]. It processes 512-bit input message blocks and produces a 128-bit hash 
value. If the message length is not a multiple of 512, an unambiguous padding 
method is applied. For the description of the padding method we refer to [24]. 
The design of MD5 is similar to the design principles of MD4 [23]. In the follow- 
ing, we briefly describe the compression function of MD5. It basically consists 
of two parts: message expansion and state update transformation. A detailed 
description of the MD5 hash function is given in BA. 


4.1 Message Expansion 


The message expansion of MD5 is a permutation of the 16 message words m; 
in each round. For each of the four rounds, a permutation of these 16 message 
words is used, resulting in 64 32-bit words, denoted by W;, with 0 < i < 63. For 
the permutation defining the ordering of message words we refer to [24]. 


4.2 State Update Transformation 


The state update transformation of MD5 starts from a (fixed) initial value [IV 
(A_4, A-3, A-2, A_-1) of four 32-bit registers and updates them in 4 rounds of 
16 steps each. The state update transformation of MD5 works on four state 
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variables. The state update transformation can be written to update one variable 
only: 


Ai = Aj-1 + (Ai-a + f(Ai-1, Ai-2, Ai-3) + Wi + Ki) & si. 


However, in our case it turned out that a description which updates 2 state 
variables A; and B; is beneficial. In this case, one step is computed as follows 
(see also Fig. B): 


B; = (Aj-a + f(Ai-1, Ai—2, Ai-3) + Wi + Ki) & si 
A; = Aj_1 + Bi: 


In each step of MD5, different step constants K;, rotation values s; and Boolean 
functions f are used. For the definition of the constants and the rotation values 
we refer to 24]. The Boolean function f differs for each round of MD5: IF is 
used in the first round, IF3 is used in round 2, and XOR is used in round 3 and 
ONX is used in the last round: 


IF(@,y, z) = ry Ð arz 
IF3(x, y, z) = <u ® azy 
XOR(z,y,z) =10y®z 
ONX(x, y, z) = y ® (£ V ~z) 


After the last step of the state update transformation, the initial value and the 
output values of the last four step are combined, resulting in the final value 
of one iteration known as Davies-Meyer hash construction (feed forward). The 
result is the final hash value or the initial value for the next message block. 
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Fig. 3. Alternative description of the step update transformation of MD5 using two 
state variables A; and B; 
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5 Path Search Technique for MD5 Type 3 Collisions 


We now tackle the problem of finding collision attacks on the compression function 
of MD5 of type 3w. Various automated path search techniques for MD4-like hash 
functions have been proposed in the past.In this section, we describe the new path 
search technique we developed to solve the problem. In fact it can be seen as a 
variation of the fine grained condition propagation originally proposed in [6]. 


5.1 Overview 


As illustrated in Fig. Æ the MSB-path of [I] is a building block of our technique. 
Starting from this MSB-path in the middle of the compression function we will 
study and search for many characteristics which propagate through the ONX 
round in the forward direction, and through the IF round in the backward direc- 
tion in a non-linear way. The constraint is that, despite different rotation values 
and Boolean functions, resulting differences in both ends of the state update will 
cancel out after the feed-forward operation. 


E Al oh >| 


IF path MSB path ONX path 


Fig. 4. The outline of the type 3w collision search with IF-path, MSB-path and ONX 
path 


5.2 Reviewing the Path Search of De Canniére/Rechberger 


In 2006, De Canniére/Rechberger [6] propose the concept of generalized condi- 
tions. The generalized conditions on a particular pair of words will be denoted 
by VX. VX represents as a set the values for which the conditions are satisfied. 
In order to write this in a compact way, we will reuse the notation listed in 
Table 2] 

In [Ø], the authors describe a heuristic method to find complex nonlinear 
characteristics for SHA-1 in an efficient way. Follow-up work directly applied 
this method in various settings in the context of SHA-0 and SHA-1 PRIBI. 
The approach may be described as follows. 


1. The starting point is a number of constraints (on the message difference and 
some target differences in the state) for the characteristic. 

2. The basic idea of the algorithm is to randomly pick a bit position which is 
not restricted yet (i.e. , a ‘?’-bit), impose a zero-difference at this position (a 
‘—’-bit), and calculate how the condition propagates. This is repeated until all 
unrestricted bits have been eliminated, or until it runs into an inconsistency, 
in which case it starts again. 
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Table 2. Notation for generalized conditions, possible conditions on a pair of bits. The 
right half is for completeness only, and will not be used in the paper. 
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3. The basic idea was improved by also sometimes picking ‘x’-bits once they 
start to appear, guessing the sign of their differences (‘u’ or ‘n’), and doing 
a backtracking if this does not lead to a solution. 


5.3 The Path Search for MD5 


We found that a direct mapping of this strategy to the case of MD5 did not 
lead to satisfactory results. It was not possible, with significant computational 
resources, to find a non-linear characteristic for the given setting. There are 
two main reasons for this difficulty. The first problem is caused by having two 
modular additions (separated by a rotation operation) within one state update. 
Fig. B]lshows the iterative step function of MD5 with variables A; and B;. Hence, 
two different carry expansions may occur and by guessing only bits of the state 
A;, conditions propagate slowly and contradictions are detected at a very late 
stage. Table B]shows an example with many free (‘?’) bits in B; due to guessing 
bits only on Aj. 

The second problem are the reduced starting constraints with only a few bit 
differences set in the chaining input. In the case of the type 3w collision search, 
there are no input difference in the message and only very few differences in 
the chaining input and at the chaining output. By guessing even more zero- 
differences (‘-’-bits), the found characteristics tend to get very sparse. In fact, 
these sparse characteristics are impossible, which is not detected early enough by 
the path search algorithm. Hence, most of the time is spent with paths whose im- 
possibility should be detected earlier. An example for a sparse (in state variable 
Ai), but impossible characteristic is given in Table B] 

To avoid these problems, the new MD5 path search strategy works as follows: 


1. The starting point are only a small number of constraints (the chaining input 
difference, no message difference and the MSB path) for the characteristic. 

2. Instead of just picking bits of A;, randomly pick non-restricted bits of the 
state B; as well. 

3. Immediately guess the sign of any unrestricted difference (‘x’-bits), as soon 
as it occurs and do a backtracking if the guess leads to a contradiction. 

4. If all ‘x’-bits have been determined, continue with randomly guessing zero- 
differences until the next ‘x’-bit occurs. 
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Table 3. A sparse but impossible characteristic due to guessing too many zero- 
differences in A;. Further, conditions do not quickly propagate into B; and contra- 
dictions are detected at a very late stage. 


i VBi, VAi VW: 


> wewe wrew 


Whenever a contradiction occurs, a simple backtracking strategy (depth first 
search) is applied. Using this improved strategy, global contradictions (impossi- 
ble characteristics) are found at an earlier stage and impossible paths are less 
likely. The disadvantage of this strategy is that long carry expansions are more 
likely to occur and the resulting characteristic are less sparse. However, since we 
apply the path search mostly in the first round of MD5, even a high number of 
conditions can be fulfilled using simple message modification techniques [3I]. 


6 Practical Realization and Results 


We now describe implementations of several parts of the attack. This illustrates 
and details the method, and also serves are a validity check of the attack. To 
recapitulate our earlier description, the practical implementation of a type 3 
collision is divided into three steps: 


— Preparatory phase. Many special paths are searched and put on a heap. 

— Birthday phase. Looking through possible pairs of prefixes, a pair needs 
to be found that matches one of the paths on the heap. 

— Differential attack phase. Search for a conforming message pair using one 
of the characteristics generated earlier. 


An optimization that is important in practice, is as follows. Starting form the 
MSB path in the middle of the MD5 compression function, it suffices to compute 
many paths through the last round (ONX part). The last steps of this path will 
impose conditions (of type ’n’ and ’u’) on the chaining input. This information 
is enough for the birthday phase. The result of the birthday phase is a prefix 
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pair that is compatible with a particular path on the heap. It remains to finish 
the characteristic, the IF part, to connect to the MSB part in the middle (see 
Fig. HJ for an illustration of the different parts). Having to deal with an actual 
chaining input pair in this phase of the attack imposes more constraints on the 
path search. However, as we detail in Section and also illustrate with the 
characteristic in the table in Appendix [A] these constraints can be dealt with in 
practice and do not impose any limitation on the attack. 


6.1 Runtime for IF Path Search 


In experiments involving the equivalent of about 2000 hours on a single core, we 
have verified the average runtime to find a single IF path is about 36 hours on a 
single core, which is about 2!” seconds in which about 238 MD5 computation 
could be done. For these experiments, we not only generated paths for a partic- 
ular starting point, as the choice of a particular starting point has unpredictable 
consequences for a particular heuristic (this was also observed in [6]). Instead 
we generated many (about 30) starting points (i.e. different sets of conditions 
on the chaining input) in a random way to derive meaningful average runtime 
estimates. This suggests that, using the proposed strategy, we can expect to find 
a path for every set of constraints, albeit with somewhat varying runtime. In 
turn, this allows us to estimate the workfactor for a type 3w collision attack on 
MD5. 

We found that the runtime for the search for IF-path does not depend on the 
number of differences in the cw. The generation of the corresponding IF-paths 
can be delayed until after the birthday phase, contributes to the final search 
complexity only in an additive way, and is hence negligible. 


6.2 A Type 3 Collision Attack Based on Actually Generated Paths 


For the practical generation of type 3w collision attacks on the compression 
function of MD5, that in turn lead to a type 3 collision attack on the MD5 
hash, we constrain ourselves to differential paths which result in runtimes for 
finding a colliding message pair below 258. For the preparatory step, it suffices 
to generate useful ONX paths. An ONX path is useful if it has a high probability, as 
the probability of a collision characteristic in the last round affects the resulting 
effort for finding a conforming message pair in a direct way. In order to give 
a bound on the allowable probability for the ONX path, we argue as follows. 
Among the four rounds (consisting of 16 steps each) the first round can easily 
be dealt with via simple message modification. The second round is an MSB-path 
and contains 16 conditions (the Boolean function needs to behave as expected 
at every step once, see also [I]), the third round contains no conditions as the 
Boolean function is an XOR, and the fourth round contains the more complex 
ONX-path. Improvements upon the original type 1 collision attack on MD5 by 


1 Each of our 2.0 GHz AMD Opteron(tm) cores performs about 2?! MD5 computations 
per second using OpenSSL 0.9.8g. 
? We tested a range between 1 and 20. 
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Wang et al. concentrated on fulfilling more conditions in round 2. In a work from 
2005 [25], 14 conditions could already be fulfilled. Subsequent work by Klima [A 
and Stevens et al. significantly improved upon this. Conservatively assuming 
to be able to only fulfill 14 conditions suggests that round number four should 
not have more than 58 — 16 + 14 = 56 conditions. In Section [6-4] we give several 
reasons why this is a very conservative assumption. 

Another important parameter of ONX paths is the number and position of 
differences it has in the last four steps, as this determines (except for carries via 
the feed-forward operation) the uniqueness of the set of allowed pairs of chaining 
inputs that can be canceled. 

Inhere, we report on empirical findings using an actual implementation of 
parts of the attack. In total we spent an equivalent of about 15000 hours on 
a single core. The number of distinct paths for type 3w compression function 
attacks on MD5 we found together with their number of conditions on the IV is 
as follows: 


5 


1216 


6 
6556 


7 
21523 


number of conditions on IV||1|2|3 | 4 
0/0} 10]130 49293]87116]127018 


number of paths 


TERE 


Not all found paths may be of use. Let p; be the number of distinct paths 
with i conditions on the IV, we want to find a j such that (%4 pi) — 27 is 
maximal. Using the actually generated paths as described above, we found about 
217-34 paths with distinct constraints (with at most 9 relevant conditions) on the 
chaining input. Including also all found paths with 10 conditions would only 
improves the attack only if more than 217-34 paths would be added, which is not 
the case. 

Using the notation of Section B] this means x=17.3, w < 58, and z < 8. Based 
on this, a type 3 collision has a runtime of 2(128-17-34+9)/2( 4.958) = 260-19 which 
is faster than the expected 264 for an ideal hash function of this size. Hence, 
MD5 offers a C? security of no more than 60 bits. 

Note however, that in this calculation, there is a gross imbalance between time 
spent on generating paths (15000 CPU hours are about 247 MD5 computations) 
and the total runtime of the attack. Assuming to spend e.g. 27 times more 
computational resources in the path generation might well lead to an increase. 
from x = 17 to 24, which in turn would decrease the runtime of the overall type 
3 collision attack on MD5 to 257, and would lead to an attack on the combiner 
MD5]||SHA-1 with complexity less than 25° (assuming the type 1 collision attack 
on SHA-1 is fast enough). 


6.3 On Memory Requirements 


Both, the generic method due to Joux and the new approach using a type 3 
collision attack, can be implemented without requiring access to large memory. 
For both cases, this results in a runtime loss of about a factor n/2, hence the 
relativ advantage of the new approach over the generic method remains. Memory 
requirements of the attack (birthday phase and differential shortcut phase) are 
as follows. 
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Birthday phase. A naive implementation of the birthday phase would require 
a table of size 2("-*+*)/? in order to generate enough pairs to find a match with 
one of the 2” paths. However, distinguished point methods may be used on a 
truncated version of the output of the compression function 

Let t be the size of the subset of bits that is needed to represent all 2” 
paths. A lower bound for t is 22/3, since every bit that is truncated leaves three 
possibilities for a path (’n’, ’u’, or ’-’). In practice, t is higher. A memory-less 
method will find a partially suitable pair in time 2°°—")/?, which would need to be 
repeated 2('—*+) times if done independently (and hence impose the additional 
condition x — z > t/2 on the attack to be more efficient than a generic attack). 

However, as described in BIBJ, the distinguished points method can be used 
to take advantage of the birthday effect also for generating more collisions (or 
suitable pairs), by keeping the entries in the list of each of the distinguished 
points. A parallelizable version with linear speed gain is described in 20]. Hence 
the search needs to be repeated only 2('~*+*)/2 times. As a result, a “memory- 
less” version of the birthday phase for the dedicated combiner attack behaves 
to a large extend as a “memoryless” version of a generic birthday attack. What 
is needed is memory to store 2* candidate pairs which are the outcome of the 
birthday phase. In all practical settings, z is small. 


Differential shortcut phase. Storing the precomputed paths for the shortcut 
attacks: in the order of a kilobyte per path. For practical values of « between 
10 and 20, storage costs are negligible and access to this memory is only needed 
once. 


6.4 On Conservative Estimates 
There are several reasons our estimates can be considered to be very conservative: 


— Basing assumption on speed-up methods (message modification, tunnels) is 
very conservative for the following reason. The lack of message differences, 
and the very simple MSB path in round 2 gives more freedom to apply speed- 
up methods as is the case in type 1 collision search attacks in earlier work. 

— Also, early stop methods which further speed-up collision search are not 
considered. 

— Runtime of various path search scenarios are measurements of actual imple- 
mentations, whose runtime may be optimized by some constant factor. 

— For our calculations, we use the highest possible allowed value for w (worst 
case). The expected value is in fact lower. 


7 Conclusions and Open Problems 
We proposed a new attack that allows collision attacks on combiners with a 
runtime below the birthday-bound of the smaller compression function when 


3 We will use the term “memoryless” to refer to these techniques, although they do in 
fact require some memory, albeit much less than a naive table-based approach. 
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the smaller compression function is MD5, potentially reducing a collision attack 
on a combiner like MD5||SHA-1 for the first time. This also answers an open 
question by Joux posed in 2004. The cryptanalytic technique we proposed for 
this is a combination of a birthday-style attack and a differential inside-out tech- 
nique that uses different parts of a collision characteristic at different stages of 
an attack, both before and after a birthday phase. This technique may be of 
independent interest. Based on only the characteristics we generated in practi- 
cal experiments with limited computational resources, a collision attack on the 
combiner with MD5 would already be around 2°° (if the “normal” collision at- 
tack on the other hash functions is fast enough), however we argued that such 
an estimate is very conservative for various reasons. 

This illustrates that the MD5 hash function can not meet the requirements 
of a “weak hash function” as informally defined in this paper. Various open 
questions arise from this work: In a vein similar to concatenated combiners, or the 
Zipper construction [I5], is it possible to come up with other collision resistant 
constructions that can use MD5, even though our results can be interpreted as 
showing that MD5 is “weaker than weak”? Another open problem is related to 
the application of our new cryptanalytic method to hash function constructions 
that use two or more parallel streams, like RIPEMD-160 [I0], as well as several 
SHA-3 candidated} So far it proved difficult to obtain results on RIPEMD-160, 
even for interesting reduced variants [I7]. 
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A Supplementary Material for Obtained Results 


A particular low-weight input chaining difference becomes the MSB-path in the 
course of 10 steps. The following table contains the full characteristics illustrating 
a candidates for a type 3w compression function attack. As a proof-of-concept, 
we provide a representative example of a conforming message pair in Table J] 


VAi VW: 

As Meas SeSsssasesHsSHH YesaHsases 

Al Win=aa Se Ssss sss TE 

AS Minas S=Ss=5S5=5 a R 

As) Iera ERAADA 

: F TAAA 0u111110000000010-|W: -------------------------------- 
AS) S3SaaS=S545=5= 1u000000000000000-|W: -------------------------------- 
Al) RSSar SSS asS5] lnnnninnnnnnnnnnnynd | Ws) -=s=====—=—===s5——45555sSS5-545= 
A: --unnnnn-----' 0110001011100000000|W: -------------------------------- 
A: --000000---n-u11u00000000110111-|W: -------------- 1----00001000111-- 
A: --011111---0-n11n0-------------- Wis aeaea 0000----- 1100----- 
AS) a MUMMERS = a Ws SSsssSssSSsSS Sas A 
Ag RSSae Sas sea Q110000==s==========- Wil, MoSA ascent aes anaes ata 
Aj) O==s=s===-= 1101109==--========== Wil SeA 
As (Q=-sesRSsssSssss ss A 
AS Meas SRS s SSS asses Sass sass Wi SSS ss sess eases asses ssa asae see 
As QieHae ss Ssss Sess SS Saar Sass sae seo| Wi SSsss Sessa es sae asses sssasaesee 
Al Mie Haase Ss ss SSS sa se Saar Saas Sssea| Wi RSs asses SSS Sa esa Sae sss Saas 
Al Qiesaa sess ss SSS sa SSS ase Sees seesea| Wi ss ssSes SSSR sae sa ss esses aaaeee 
As SSH n Rat Raa Sa Sn SR SRS Sa SAS] WS Sat Rasa SSS Sa SaaS aaa SRSasSaea= 
Ae: Sse sas Sasa sass Sse QRS SRS SHnE| WS SRS ae SRS SS Sas See nese Sansa 
AS) Messe sess ssasssssSS 55> UaSsssassa| We nasa SSes asses esses ss esses Sass 
As Ss ea Sas Sas ae nasa ess Ue Ic a 
E aS aS aas aaa Sasa SS RSA Wie) Sta Sn Sean Saat aaa 
AS MHS e aS aS aaa Sas Sn Sas Mi Raa Wi SSR 
TEE esas SaaS ae Sa Sse RSW) SR ae 
TO: Sas aasaS as Saase IN Saas Sa SSS a Sse a SS SSS SSS Ssaas ess 
Ae Sse a ras Sas aaa Sa ssae QS SSRs S55 WS Sasa S Sas Sassen SaaS SS Se sasese 
Aig, SSSR PSS aaa SRS E. E SSeS esa SSeS Sean aS ese Ssh aseasee= 
IPB) SSS anata SRS aa Sa Sse eS Sea aSae 

FR gy Sasa Sasa A A 

FPS) Sasa Race asa EAR 

RR SaaS ena aa SSeS esoee eS sees ee 


MD5 Is Weaker Than Weak: Attacks on Concatenated Combiners 161 


Table 4. A conforming message pair for the first 16 steps 


Hı |C4F12702 D25873C9 5B88CE47 9A8EBB1D 
Hy [44F12502 525873C9 DB88CE47 SA8EBB1D 


AH; [80000200 80000000 80000000 00000000 


Mı D830883A AA2456AA 24B9260C D2F17AE9 F893211E 08F4298C 8A0C7756 3492552F 
= C7CB7D9D 7FB6804C 9336A183 44256EOD 6DO95FCF O8D8D9EA 5D79COBA OF2CD7C5 
M* D830883A AA2456AA 24B9260C D2F17AE9 F893211E 08F4298C 8A0C7756 3492552F 
` 1 |C7CB7D9D 7FB6804C 9336A183 44256E0D 6D095FCF 08D8D9EA 5D79COBA OF2CD7C5 


AM. 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
= 1 [00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 


— 156733C4 4A05644B 20E6A26E 7718EBA4 
H. 


956733C4 CA05644B AOEGA26E F718EBA4 


AAH»2 |80000000 80000000 80000000 80000000 
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Abstract. The search for SHA-3 is now well-underway and the 51 sub- 
missions accepted for the first round reflected a wide variety of design 
approaches. A significant number were built around Rijndael/AES-based 
operations and, in some cases, the AES round function itself. Many of the 
design teams pointed to the forthcoming Intel AES instructions set, to 
appear on Westmere chips during 2010, when making a variety of perfor- 
mance claims. In this paper we study, for the first time, the likely impact 
of the new AES instructions set on all the SHA-3 candidates that might 
benefit. As well as distinguishing between those algorithms that are AES- 
based and those that might be described as AES-inspired, we have de- 
veloped optimised code for all the former. Since Westmere processors are 
not yet available, we have developed a novel software technique based on 
publicly available information that allows us to accurately emulate the 
performance of these algorithms on the currently available Nehalem pro- 
cessor. This gives us the most accurate insight to-date of the potential 
performance of SHA-3 candidates using the Intel AES instructions set. 


1 Introduction 


Intel has announced that a new AES instructions setl] will be introduced in new 
processors such as Westmere and available early in 2010. These instructions will 
provide resistance to a range of software side-channel attacks and offer 
significant performance benefits for encryption and decryption using AES BA. 
Simultaneously the NIST SHA-3 effort to establish a new cryptographic 
hash algorithm is well-underway and several teams of submitters have used AES- 
like transformations as a cryptographic building block. Several of these teams 
have explicitly expressed the assumption that their hashing algorithms could 
take advantage of AES-NI and thereby enjoy significant performance benefits. 
Since the Westmere processor is still unavailable, there have been no substantive 
efforts to assess the possible implications of this important issue. In this paper, 


1 Denoted AES-NI in this paper for “new instructions”. 
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we provide the first quantitative analysis that estimates the likely impact of the 
Intel AES instructions set on SHA-3 candidates. 

The first step is to identify which SHA-3 candidates should be considered, and 
this is not as straightforward as it might appear. AES-NI can be used in different 
combinations to carry out different transformations, and so AES-NI might be 
used in many more ways than would naively be expected. As a result, there are 
submissions for which the variant that provides (say) 256-bit digests gains from 
AES-NI, while the same algorithm providing a 512-bit digest cannot. 

The second step is to develop a sound methodology for implementing the differ- 
ent algorithms, optimising them, and measuring their performance. Clearly this 
is a challenge when Westmere processors are unavailable. So we developed new 
techniques from publicly available information—in effect, uncovering the behavior 
of AES-NI—and this allowed us to emulate Westmere behavior on the publicly- 
available Nehalem chips. While this might appear to detract from the value of 
the performance figures we derive, the level of validation and confirmation that 
took place during this work makes us confident that our results are close to the 
Westmere reality. 

Our sole goal in this paper has been to compare the performance of SHA-3 
candidates when using AES-NI. To this end, we have set aside cryptanalytic 
discussions and we have implemented and optimised all the algorithms that 
we believe might benefit from AES-NI. While the authors of this paper are 
independent (co-)submitters of two SHA-3 proposals, we have strived to be fair 
and consistent. In addition, all the code is publicly available via and we 
welcome interested parties to download and improve upon it. When Westmere 
processors appear, the same samples can be used for real silicon running AES-NI. 


2 The Intel AES Instructions 


To start we provide a brief description of the Intel AES instructions, and com- 
plete details can be found in [[3§{4]. Intel’s AES instructions set consists of six 
instructions, four of which aesenc, aesenclast, aesdec, and aesdeclast are 
designed to support data encryption and decryption. The names of these instruc- 
tions are short for AES encryption (inner and last) round and AES decryption 
(inner and last) round, see Table H] from Appendix A. These instructions have 
register /register and register/memory variants. 

There are two other instructions for the AES key expansion but they seem to 
be of little use to the SHA-3 submissions and are omitted from this paper. 


2.1 What Operations Can We Use AES-NI for? 


Clearly, AES instructions can be used whenever a SHA-3 proposal uses one 
of the internal or final AES encryption (or decryption) rounds. But they can 
be used more widely than this. For instance, calling aesdeclast and aesenc 
back-to-back, both with a zeroed second operand, is functionally equivalent to 
performing AES MixColumns on the first operand, see Appendix A. 
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In fact if we use the pshufb instruction which shuffles bytes in a 128-bit 
word, see Appendix A, then we can isolate all of the AES-constituents using 
AES-NI [I4], namely: 


SubBytes , ShiftRows , MixColumns , 
InvSubBytes , InvShiftRows, InvMixColumns . 


To illustrate the versatility this gives us, we combine standard xmm instructions 
with AES-NI to perform encryption with Rijndael [8] operating on 256-bit blocks. 
The plaintext is stored in xmm; and xmm;, but AES-NI cannot be used directly 
since half the bytes of xmm; must be swapped with half the bytes of xmm;. However, 
this swap can be efficiently implemented using two pshfub (1) to pack the bytes 
to-be-swapped into two 32-bit words, two pblendw (2) to swap the 32-bit words, 
and two pshufb (3) to re-order the bytes giving, in total, the following state 
permutation: 


4 
5 
6 | 10) 14 8 [13] 10] 11 8 | 13) 10] 11 8 |13 | 10] 11 8 | 13] 10] 11 6 | 10] 14] 2 6 | 10) 14) 2 
a 


À 
ajola e 
Bl) Ee) a |e 
wlrwlelo 


After this, aesenc can be applied in parallel to xmm; and xmm,, thereby giving the 
appropriate ShiftRows for the large state, and Rijndael encryption on a larger 
state has been emulated. Techniques like these are important to us since it is 
possible that several SHA-3 candidates that do not use the complete AES round, 
or that use a larger state, might still benefit from AES-NI. 


2.2 The “In-Scope” SHA-3 Candidates 


Obviously SHA-3 candidates that use the AES round as a building block can 
benefit from using AES-NI. In addition, algorithms that use the AES S-box 
along with some byte shuffling with or without the AES MDS mixing matrix 
can benefit. One can also apply these operations to larger states, as we have seen 
for Rijndael with 256-bit blocks. The main problems in using AES-NI tend to 
arise when designs move away from the AES MDS matrix. Generally speaking, 
this dramatically limits any potential performance gain from AES-NI, partic- 
ularly since most optimised assembly implementations would incorporate the 
MDS matrix operation into table look-ups, potentially combined with other op- 
erations. AES-NI might however still be of interest to these designs, especially 
in thwarting some side-channel attacks. 

There are four submissions that directly, and transparently, use AES rounds 
for all hash output lengths. These are ECHO [2], LANE [8], SHAVITE-3 [A], and 
VORTEX [23]. For these algorithms it is clear that we can directly use AES-NI. 
There are others that are clearly inspired by Rijndael-like techniques in their 
construction. These include CHEETAH [22], FUGUE [5], GrosTL [2], LESAM- 
NTA [0], Lux 27, and TWISTER [L]]. The submission SHAMATA [I] has already 
been withdrawn, and while some other surveys [$] describe SARMAL BI] as being 
AES-inspired, a non-AES S-box and MDS mixing layer take it out-of-scope. 
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Table 1. The SHA-3 submissions with substantial Rijndael-based components. Check- 
marks indicate those that might benefit from AES-NI, for different hash output lengths. 


Algorithm | 224-bit 256-bit 384-bit 51-bit 


ARIRANG va v no no 
CHEETAH vA v no no 
ECHO v v v v 
FUGUE no no no no 
GROSTL no no no no 
LANE vA v v v 
LESAMNTA v v v v 
LUX v vA no no 
SHAVITE-3 v v V va 
VORTEX v v V V 


While LESAMNTA offers advantages for 256- and 512-bit hash outputs, it is 
interesting that only the 256-bit versions of CHEETAH and LUX benefit from 
AES-NI. By contrast, it appears that no variant of FUGUE, GRØSTL, or TWISTER 
are likely to benefit. These algorithms use a very different MDS mixing matrix 
to the AES and, as a result, end-up being too distant to use AES-NI in any 
efficient way. So even though a combination of AES-NI instructions could be 
used to isolate the S-box operations for FUGUE and GRØSTL, say, the table look- 
ups typically used for the MDS operations in current optimised implementations 
mean that there is no easy way for these algorithms to benefit from AES-NI. 

Finally, even though the submission ARIRANG is quite different from the 
Rijndael-based constructions, it might potentially benefit from AES-NI. We have 
therefore included it in our considerations and Table [summarizes the (alpha- 
betically ordered) list of algorithms and hash output lengths that we consider. 


3 Implementation and Measurements 


Obviously the best way to get performance timings is to write the appropriate 
code, run it on a Westmere processor (the first with AES-NI), and measure the 
performance. However, since this processor is not yet available, we propose a 
new methodology that can be used to get an accurate emulation of AES-NI. We 
rely on the fact that Westmere (formerly Nehalem-C) and Nehalem processors 
share the same micro-architecture. This means that if we can find suitable in- 
structions patterns that behave exactly as AES-NI instructions, we will get very 
good estimates for the future performance of AES-based SHA-3 candidates on 
a Westmere processor, but using today’s Nehalem processor. 

Previously, a substitution instruction was proposed B3] for future processors. 
However this substitution does not exhibit the correct behaviour for Westmere 
and can give misleading results, see Section BJJand Appendix B. Here we pro- 
vide a particularly accurate replacement instructions pattern for aesenc and we 
explain how to derive it from publicly available information only. 
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3.1 Replacement Instructions Pattern 


The first step is to understand the exact behavior of the AES-NI instructions at 
the micro-operation (uop) level in particular that of aesenc and aesenclast. 

An Intel code analyzer tool (IACA [2JJ) is publicly available and gives the 
following information about aesenc (aesdec yields the same output): 


Total Latency: 6 Cycles; Total number of Uops: 3 

| Num of | Ports pressure in cycles | | 

| Uos | O-DV!] 1/1 2- DI 3- DI 41 5] l 

| 3 |. 42°] | | | | | | | 1 | CP | aesenc xmmi, xmmO 


(In this trace, ‘DV’ stands for the divider pipe of port 0, ‘D’ for the data fetch 
pipe of ports 2 and 3. Additionally, an ‘X’ in the trace will be used to denote the 
possible ports a op can be dispatched to.) 

This shows that aesenc consists of three ops, two of which are dispatched 
to a unit on port 0 and one which is dispatched to a unit on port 5, and that 
the instruction’s latency is 6 cycles. However, this information is too coarse to 
provide hints for the right instructions pattern replacement: we need to derive 
the exact scheduling of these ops. In what follows, we represent pops by bars 
for which the length varies according to their latency. The gray bars denote the 
pops on port 5 while the white ones denote the ops on port 0. Hence == is a 
2 cycle pop on port 5 and == is a 3 cycle pop on port 0. 

From Intel’s white paper [I3] we know that AES-NI are highly parallelizable. 
This discards the sequential op patterns on port 0. Moreover, the white paper 
explains (see Fig. 9 and 15) that aesdec is structured using the equivalent inverse 
cipher (described in Appendix B), which is confirmed by an IACA trace identical 
to that of aesenc displayed above (see Appendix B). This leads us to assume 
that the pop on port 5 is the exclusive-or with the key, which is corroborated by 
the purpose of unit 5, see [9]. Therefore, the op on port 5 runs in cycle 6 and 
requires that wops from port 0 are finished. 

Intel’s optimization reference manual gives additional information on the 
possible pop latencies and throughput for each port on the Nehalem micro- 
architecture. In particular, we see that pops dispatched on port 0 can only have 
latencies 1, 4, or 5 cycles, and that pops on port 5 all have a 1 cycle latency. 
Since aesenc has a total latency of 6 cycles, this only leaves the following possible 
patterns: ==", =", and ==". (Two pops cannot start at the same 
cycle in the same unit but a yop is started as soon as possible to maximize the 
overall throughput). It is impossible that a op on port 0 performs the SubBytes 
and/or ShiftRows step while it runs in parallel with the other yop performing 
the MixColumns step which would then need the output of the first pop. So both 
Lops on port 0 perform at least one of the four MixColumn multiplications of the 
MixColumns step. The most natural way of doing this is to symmetrically split 
the computation on two independent halves of the state. In this case, the two 
pops on port 0 have the same latency, which only leaves the ===" pattern. 
This is again supported by the IACA trace of aesimc instruction, as well as the 
choice of inverse equivalent cipher for aesdec. 


? Instructions are split into micro-operations and dispatched to specialized CPU units. 
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Now we turn to the replacement instructions set which would give exactly the 
same j1op-behavior as the instruction aesenc reg, reg. A previously proposed 
replacement B37] is not appropriate for Westmere (see Appendix B). Instead, 
a sequence that closely simulates the pop behavior of aesenc xmm;, xmm; is: 

movdqu xmm, xmm; 
mulps xmm;, xmm; 
mulps xmmk, xmmj 
xorps xmm; , xmm, 

For now, let us ignore the movdqu instruction. The IACA trace displayed below 
shows that the last three instructions of the replacement behave exactly as the 
aesenc xmm0, xmm1 instruction with a latency of 6 cycles. It yields two identical 
and independent pops (they both come from mulps) on port 0, a 1 cycle pop on 
port 5 which is forced to start after the two pops on port 0 since xorps has a 
1 cycle pop on port 0 together with a dependency on register xmm2: 


Total Latency: 6 Cycles; Total number of Uops: 4 

| Num of | Ports pressure in cycles | | 

| Uos | O-DV!] 11 2- DI 3- DI 41 51 l 

| 1 | X| | 1] | | | | | X | CP | movdqu xmm2, xmmO 
| i | 1 | | | | | | | | | mulps xmm0, xmm1 
| 4 Pr] | | | | | | | | CP | mulps xmm2, xmm1 
| 1 | | | | | | | | | 1 | CP | xorps xmm0O, xmm2 


The reader might wonder why we added the movdqu instruction to the beginning 
of the replacement: by introducing a dependency on xmm0, we try to prevent the 
processor from re-ordering the instructions at the prefetch and re-order step. 
Hence, movdqu acts as a fence and ensures that the replacement fragment exhibits 
a similar atomic behavior as aesenc. Since movdqu only has a latency of one cycle 
and can be dispatched on port 0, 1, or 5, it will in most cases execute on port 1 
in parallel of the other pops—and does not interfere with the replacement, and 
rarely on port 5 or 0 which would add one cycle to the replacement latency. 

Note however, that though the replacement allows for a very good simulation 
of aesenc in terms of latency, throughput, and port behavior, it does introduce 
a significant issue: the use of a third register xmm, (k = 2 in IACA’s trace) might 
interfere with code surrounding the replacement by introducing false dependen- 
cies. We took extra care in our implementations to avoid these when using the 
replacement. This was not an easy task, especially for those SHA-3 candidates 
that make heavy use of AES-NI parallelism such as ECHO and LANE. 

Another potential issue is that the aesenc instruction is 5 to 10 bytes long 
depending on the variant whereas our replacement is 13 to 22 bytes. This can lead 
to an efficiency penalty as the prefetch buffer of the Nehalem micro-architecture 
has a size of 16 bytes. However an experiment (see Appendix B) shows that the 
size of replacement is unlikely to be a significant factor. 

Finally, we refer the reader to Appendix B for a justification of our choice of 
the following replacement for memory-based variants like aesenc xmm;, [mem]: 
movdqu xmm, xmm; 
mulps xmm;, [mem] 
mulps xmmk, xmm; 
xorps xmm;, xmmk 
as well as for a discussion regarding replacements for other AES-NI instructions. 
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3.2 Timing Methodology 


For each in-scope candidate and for each hash output length, we implemented 
two versions of the submission. These were identical in every way, except one 
had AES-NI instructions and was used to ensure the correctness of our AES-NI 
optimized implementation against the NIST-submitted test vectors with Intel’s 
Emulator [20]; the other had AES-NI instructions substituted with their replace- 
ments allowing it to run on a Nehalem to derive performance estimates. 

To get consistent results over the candidates, we measured the number of 
cycles (using rdtsc instructions and averaging over more than 10° samples to get 
stable results) taken by the compression function of each algorithm on the same 
Nehalem machine running Linux. However NIST’s API was fully implemented 
to check correctness and, in many cases, these were taken from the reference 
code sent to NIST by the submitters. To eliminate as much noise as possible 
from the OS, high priority scheduling was allocated to the measured code. All 
algorithms were implemented by the same programmers, providing a somewhat 
uniform level of optimization. 


4 Candidate Descriptions and AES-NI Implementations 


In this section we consider the design and discuss the implementation of the 
in-scope candidates. Full details of the algorithms can be found in the respective 
algorithm descriptions, so we only give a brief overview of their functionality 
along with insights into their design with regards to AES-NI. Our implementa- 
tion proposals will be available from our website [29]. 


ARIRANG is a single-pipe compression function-based proposal. The bulk of 
the computation in the compression function consists of the 40-step expansion of 
a 512-bit message block, which is highly efficient in general purpose registers and 
can be pre-computed, and a StepFunction that is repeated 40 times. StepFunction 
requires eight exclusive-ors, four fixed rotations, and two calls to a function G?°6 
that uses elements of the AES. For longer hash outputs, the equivalent function 
G°!? uses a larger MDS matrix that cannot be emulated using AES-NI, and so 
any potential gain is restricted to 256-bit outputs. 

However, the extent of this gain is very limited since ARIRANG uses + of 
an AES round as a building block, but the latency cost of aesenc while only 
performing + of an AES round means that the performance of AES-NI, when 
compared to the use of lookup tables, is not competitive. Attempts to parallelize 
two of the + AES rounds introduced too many overheads. We conclude that 
AES-NI is unlikely to offer any substantial benefits to ARIRANG. 


CHEETAH is a single-pipe compression function-based proposal. The com- 
pression function consists of two strands of computation: a message-dependent 
EXPANDED BLOCK is generated which provides a key-like input to encrypt the 
INTERNAL STATE. While the computations on EXPANDED BLOCK and INTERNAL 
STATE are both Rijndael-inspired, the former uses a different non-AES MDS 
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matrix that is hard to emulate. Thus this key derivation is unlikely to benefit 
from AES-NI and the use of look-up tables seems better suited. 

For operations on the INTERNAL STATE, the 224- and 256-bit versions of CHEE- 
TAH use an operation InternalRound that can be emulated using AES-NI. How- 
ever, the inherent sequential nature of the rounds and the fact that AES-NI 
cannot be used in the most straightforward way means that while there are 
gains, they are not as significant as they might be for some other submissions. 

For the 384- and 512-bit versions, the operation InternalRound is modified to 
use a larger MDS matrix that, once again, cannot exploit AES-NI. So for these 
larger outputs, there is unlikely to be any gain with AES-NI. 


ECHO is a double-pipe compression-based hash function. The 224- and 256-bit 
(resp. 384- and 512-bit) versions encrypt a sixteen 128-bit words state in eight 
(resp. ten) rounds of a compression function calculation. The encryption round 
applies two AES rounds to each word of the state with a counter or salt as a 
key, followed by a BIG.MixColumns MDS and row shift operation that provides 
mixing across the entire state. For all hash output lengths, ECHO can benefit 
from AES-NI and, while ECHO is primarily a double-pipe compression-based 
hash function, a simple single-pipe variant was announced at the first NIST 
workshop. We therefore include it in our considerations. 

The AES encryption rounds are directly performed with aesenc with pre- 
computed keys in memory. This allows the algorithm to take full advantage of 
the AES-NI parallelism. The BIG.MixColumns operation however cannot further 
benefit from AES-NI, though it is based on MixColumns. As an ECHO encryption 
round does not vary with the output length, the same optimizations apply. 


LANE is a single-pipe compression function-based hash function. COMPRESS 
consists of a message expansion, a set of six P-PERMUTATIONS, and then a set 
of two Q-PERMUTATIONS. As both sets of permutations are based on the AES 
round, LANE benefits from AES-NI at all hash function output lengths. 

Both PERMUTATIONS are made of L = 2 (resp. L = 4) lines of AES rounds for 
hash outputs of 256 (resp. 512) bits and after each round of AES in each line, 
an operation SwapColumns mixes the L computation strands. LANE therefore 
offers two levels of parallelism: the P- and Q-PERMUTATIONS and the lines inside 
the permutations. The latter does not allow to take full advantage of AES- 
NI parallelism as SwapColumns breaks the instructions flow so we use the two 
levels of parallelism simultaneously: we compute an AES round for each of the 
6L lines of the P-PERMUTATIONS in parallel before applying SwapColumns in 
each P-PERMUTATION, and do the same for the Q-PERMUTATIONS. (The code is 
completely unrolled and all keys are precomputed.) 

For 256-bit outputs, the state nicely fits the available xmm registers. But for 
512-bit outputs, the state does not fit anymore and only three P-PERMUTATIONS 
are computed in parallel instead of all six as before. This, in itself, does not 
change the AES-NI throughput as the number of lines is doubled in each PER- 
MUTATION and thus the same number of AES rounds as before is performed 
in parallel. However, the 512-bit version of SwapColumns imposes an additional 
overhead. 
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LESAMNTA is a single-pipe compression function-based hash function. The 
underlying block cipher has the general topology of an unbalanced Feistel cipher; 
at each round two strands of the eight that comprise the cipher state are updated 
using a message dependent “subkey” and the round function fos6 (resp. fs512) 
for the 256-bit (resp. 512-bit) hash output. The subkey generation and the fos¢6 
and fsı2 functions in the encryption path all involve AES-like operations and 
LESAMNTA can potentially benefit from AES-NI. 

For the 256-bit version, the key schedule poses few problems. However, one 
difficulty for encryption path is that the AES-like transformations operate on 
64-bit values and the MDS matrix is distinct from that of AES. The MDS matrix 
(i 5) that is used is however a submatrix of MixColumn and so inserting zero 
bytes at the entry of the appropriate MixColumns entries will allow to perform the 
AES-like transformation using AES-NI. This can be achieved with the sequence: 
pshufb, pxor with a particular constant, aesenc, and pshufb. Note that in this 
case, aesenc is used at 4 of its normal efficiency. 

In the case of 512-bit hash outputs, the AES-like transformation in the key 
schedule involves an MDS that is too different from MixColumns, and so AES-NI 
is not really of any use there: the keys are therefore precomputed in a classical 
way. However, on the encryption side the round functions now use the full AES 
round, which gives nice advantages. 

For both sets of outputs, it is possible to use the unbalanced nature of the 
Feistel construction to perform four f functions in parallel for both output sizes. 
In the 256-bit version, this carries a greater benefit: the four instances of the 
sequence preparing the data mentioned above can also be grouped to increase 
the overall throughput. 


LUX is astream-cipher based hash function that uses two banks of cipher state; 
the BUFFER and the CORE. At each iteration a block of message is input to both 
the BUFFER and CORE, both of which are then updated with information being 
passed between them. Sixteen blank rounds of computation seal the hashing 
process after the last block of message has been processed. While the BUFFER 
transformation is very simple, the CORE transformation is built on Rijndael-like 
operations. And it is the Rijndael-like operations in the CORE that are the most 
time-consuming parts of LUX, with mixing of the CORE and BUFFER requiring 
only a few, simple xmm instructions. 

For all hash output lengths, the CORE transformation operates on a larger 
state than we find in the AES. However for 256-bit hash outputs it is equivalent 
to Rijndael operating on 256-bit blocks and techniques described in Section BI] 
can be used. Thus LUX with 256-bit outputs will benefit from AES-NI. 

When used to generate longer hash outputs, however, LUX changes the form 
of the MDS transformation in such a way that it cannot easily be emulated 
using AES-NI. It appears for these longer outputs that AES-NI will not offer 
any advantage. In fairness, the optimised implementations of LUX for 512-bit 
outputs are already extremely competitive. 

As an aside on the timing methodology, it is worth observing that we imple- 
mented sixteen iterations of the classical compression function found in LUX as a 
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single COMPRESS operation. This avoided buffer rotations and helped treat LUX 
in a way that was more consistent with the other algorithms. 


SHAVITE-3 is a single-pipe compression function-based design, with the com- 
pression function being built closely on a Feistel cipher. The round function for 
this Feistel cipher is built directly from an AES round, and the accompanying 
message expansion also uses the AES round function. As a result, all hash output 
sizes can expect to benefit from AES-NI. 

For the 256-bit hash output, the round function for the 12-round Feistel cipher 
consists of three rounds of the AES and we can therefore use AES-NI directly. To 
avoid any interaction with the memory, it is much more efficient to perform key 
derivation inside the xmm registers. Key derivation produces 36 subkeys of 128 bits 
using a combination of a non-linear layer based on four aesenc operations and 
a linear layer. It is possible to interleave key derivation with encryption since 
there are sufficient registers. The linear part of the key derivation only requires 
a few xmm manipulations (if handled properly) while the four AES rounds in the 
key schedule can be performed in parallel. The Feistel round function involves 
three AES rounds, but this time they are chained. SHAVITE-3 derives a significant 
benefit from avoiding memory access. 

For the 512-bit hash output, the underlying 14 rounds block cipher is a gen- 
eralised Feistel network. At each round there are two parallel invocations of four 
AES rounds. Now, however, key derivation produces new 128-bit words in sets of 
eight, rather than four, and so this needs to be performed in place while keeping 
the rest of the state in registers. The linear part of key derivation can still be im- 
plemented efficiently and the eight AES rounds can be parallelized. Within the 
encryption operation, there are now two Feistel round functions, each with four 
dependent AES rounds but these can be interleaved, increasing the throughput 
slightly. SHAVITE-3 is very closely built around the AES round operation and 
gains substantially from AES-NI. 


VORTEX is a single-pipe compression function-based design that uses the en- 
veloped Merkle-Damgard construction and builds upon MDC-2 [Z]. The building 
blocks of VORTEX are Rijndael rounds on 128-bit blocks for VORTEX-256 and Ri- 
jndael rounds on 256-bit blocks for VORTEX-512. Cross-mixing between the 128- 
bit strands (resp. 256-bit strands for VORTEX 512) is multiplication-based. The 
parameter Mr determines whether integer multiplication (Mr = 1) or carry-less 
multiplication (Mr = 0) is used. A motivation behind VORTEX was to directly 
exploit AES-NI and the carry-less multiplication instructions on future Intel pro- 
cessors. In this paper we consider the case of Mp = 1. For VORTEX with 256-bit 
outputs we can directly exploit the aesenc operation. The key schedule calls 
upon the AES S-box but this can be easily emulated. For the 512-bit outputs, 
the underlying cipher operates on 256-bit states and, using similar techniques 
to those described in Section BJ it is straightforward to operate on this larger 
state. In contrast to some other algorithms, e.g. ECHO and LANE, VORTEX fits 
into the registers. On the other hand, it turns out that there is a bit less room 
to exploit AES-NI parallelism. 
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5 Implementation Results 


Performance estimates for all SHA-3 candidates considered in this paper are 
given in Table} The Nehalem measurements were made on a Core i7 920 pro- 
cessork] clocked at 2.67 GHz with GNU/Linux Debian running a 2.6.26-1-amd64 
kernel. The compiler was icc for amd64, Version 11.0, Build 20081105. 
As explained in Section B] we believe that these results will be very close to the 
real performance of the algorithms when run on the Westmere processor. For 
reference, some performance figures using assembly code from OpenSSL for 
SHA-256 and SHA-512 timed under the same methodology on the same proces- 
sor are 18.6 and 12.0 cycles/Byte respectively. While our results are preliminary, 
we feel they are sound enough to make some general observations. 


Table 2. The predicted Westmere performance in cycles/Byte for those algorithms 
that can benefit from the Intel AES instructions set. For illustration, we provide the 
optimised performance figures given by submitters at the first NIST SHA-3 workshop. 
Other performance data can be found at P]. Since in all cases 224- and 384-bit outputs 
are obtained by truncating 256- and 512-bit outputs, we only give figures for the latter. 


es TE TA 
Algorithm AES-NI previous AES-NI previous 


ARIRANG 
CHEETAH 
ECHO (double-pipe) 


ECHO-SP (single-pipe) 
LANE 

LESAMNTA 

LUX 

SHAVITE-3 

VORTEX (Mr = 1) 


While it is tempting to group all AES/Rijndael-based SHA-3 submissions 
together D], one significant point of difference is that some will not be able to 
take advantage of AES-NI. Further, there are some algorithms, e.g. CHEETAH and 
LUX, for which the shorter hash outputs are likely to gain from AES-NI while the 
longer hash outputs, i.e. 384 and 512-bit, won’t. Interestingly, CHEETAH is one 
of the fastest AES-inspired SHA-3 submissions on the NIST reference platform. 
But its performance when used with AES-NI is somewhat constrained by other 
non-AES components and CHEETAH may be slightly less competitive than the 
other algorithms when using AES-NI. That said, currently optimised code for 
this algorithm is reasonably efficient anyway. Our results for LESAMNTA differ 
from those at [I7] which unfortunately use a different, inappropriate replacement 
instruction (see Section BJJand Appendix B). 


3 Note that to ensure stable and clean results, we disabled two features of the processor: 
Hyperthreading and Turbo Boost. 
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Table 3. For those algorithms that solely use the AES round in its entirety, we give 
the number of AES rounds/Byte as a crude measure of how much the AES is used 
during the hashing process. We also give the cost, which is computed as the number of 
cycles/AES round. In general terms, the lower the cost, the more efficiently the AES 
round is being used with respect to AES-NI. 


a 256-bit 512-bit 
ECHO (double-pipe) 
ECHO-SP (single-pipe) 
LANE 


SHAVITE-3 
VORTEX (Mr = 1) 


As would be expected, algorithms that are specifically designed around the 
AES round operation—ECHO, LANE, SHAVITE-3, and VORTEX—have the most 
to gain by appealing to AES-NI. If we consider the figures for 256-bit hash 
outputs then, for single-pipe variants, the throughput performance of these four 
algorithms is similar. However there is a much greater contrast in performance 
when we turn to 512-bit hash outputs, and this is due to differences in design. For 
instance, SHAVITE-3 for 512-bit outputs gains substantially from AES-NI since 
the modified round function for 512-bit outputs offers many opportunities for 
parallelism. This is something that is especially suited to AES-NI. On the other 
hand, when we move from 256- to 512-bit outputs with LANE, while the number 
of AES operations per byte increases in roughly the same proportion as was the 
case for SHAVITE-3, there is a performance impact that comes from doubling the 
size of the lanes in the P- and Q-PERMUTATIONS. Of course, when compared to 
existing optimised implementations LANE will still gain considerably when using 
AES-NI. But it does demonstrate how different design decisions can lead to very 
different performance profiles. 


6 Conclusions 


In this paper we have provided the first in-depth analysis of the likely impact 
of Intel’s AES instructions set on the first round SHA-3 candidates. To do this 
we designed a new methodology to replicate and anticipate the likely behavior 
of AES-NI in Westmere and we feel that this, in itself, will be of considerable 
interest. We have also provided the first performance estimates for those submis- 
sions that are likely to gain from AES-NI. Throughout we have tried to make a 
consistent and comprehensive comparison, and we have used the best currently- 
available information. We believe that our predictions are accurate and, in fact, 
may even be conservative. All the code we have developed is public and this 
will allow others to develop their own optimized versions and to obtain improved 
performance projections. 
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Finally this paper sheds light on what has, until now, been a somewhat hidden 
issue. It is clear that the new Intel AES instructions set will have a profound 
effect on the performance of some of the SHA-3 submissions. At the same time, 
this low-level support for AES will become very widespread within a few years. 
Certainly this is only one factor among many for the SHA-3 candidates; but it 
may well be one of the important ones. 
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Appendix A: Instructions 


Table 4. The instructions that provide AES encryption 


aesenc xmmi, xmm2/m1i28 aesenclast xmmi, xmm2/m128 
Tmp := xmmi; Tmp := xmmi; 

Round Key := xmm2/m128; Round Key := xmm2/m128; 
Tmp := ShiftRows (Tmp) ; Tmp := ShiftRows(Tmp) ; 
Tmp := SubBytes (Tmp); Tmp := SubBytes (Tmp) ; 
Tmp := MixColumns (Tmp) ; xmmi := Tmp xor Round Key 
xmmi := Tmp xor Round Key; 


Table 5. How to derive the MixColumns operation from AES-NI 


aesdeclast xmmi, 0x0 --: 0 
aesenc xmmi, 0x0 --- 0 
Tmp := xmmi 

Tmp := InvShiftRows (Tmp); 
Tmp := InvSubBytes (Tmp) ; 
xmmi := Tmp xor 0x0; 

Tmp := xmmi 

Tmp := ShiftRows (Tmp); 
Tmp := SubBytes (Tmp) ; 
Tmp := MixColumns (Tmp) ; 
xmmi := Tmp xor 0x0; 


Description of Some Additional Operations Used in This Work 


pshufb xmmi, xmm2/m128 This instruction is used to generate a byte-wise per- 
mutation of the contents of the first 128-bit operand, where the permutation is 
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defined by the second operand (xmm register or a memory location). The sec- 
ond source operand (xmm2/m128) is used as a mask, as follows. For each byte of 
xmm2/m128, the least significant four bits specify from where to select the corre- 
sponding byte of the source operand (xmm1). In addition, if the most significant 
bit of a byte of xmm2/m128 equals one, then, regardless of the values of the other 
bits in that byte, zero is written in the result byte. 


pblendw xmmi, xmm2/m128, imm8 This operation “blends” the contents of 
two 128-bit operands (two registers or a register and a memory location) at the 
granularity of 16-bit words. Words from the second operand are conditionally 
written to the destination operand, depending on the setting of bits in the byte 
operand imm8. If bit k of this byte is set, then word k of the source is copied to 
the destination. If bit k is zero, word k of the destination is unchanged. 


Appendix B: Rationale Behind the Replacements 
Additional [ACA Traces 


AES-NI provides the aesimc instruction to perform InvMixColumns: 


Total Latency: 6 Cycles; Total number of Uops: 3 

| Num of | Ports pressure in cycles | | 

| Uos. | O-DV!] il 2- DI 3= DI 41 5] l 

| 3 | 2 | | | | | | | | 1 | CP | aesimc xmmO, xmmi 


The IACA tool supports the aesdec instruction the trace of which is shown 
below but does not support the aesdeclast instructions. From what has been 
derived for aesenc, aesdec, and aesimc, it is reasonable to assume its trace 
would have been identical to that of aesdec. 


Total Latency: 6 Cycles; Total number of Uops: 3 

| Num of | Ports pressure in cycles | | 

| Uops | O-DV!] 1] 2- DI 3= DI 41 51 l 

| 3 | 2 | | | | | | | | 1 | CP | aesdec xmmO, xmmi 


Instructions Replacement Size 


In order to evaluate the possible impact on the prefetching step (the prefetch 
buffer has a size of 16 bytes) or on the instruction cache, we conducted the 
following experiment: we went through the same kind of analysis as we conducted 
on aesenc and we replaced pmulld xmm1i5, [mem] which has two sequential pops 
of 3 cycles on port 1 by 


phminposuw xmm15, [mem] 
phminposuw xmmi5, xmm15 


which have a single pop on port 1 each, but are interdependent. While the size of 
pmulld is 7 bytes and the size of the proposed replacement is 17 bytes, they both 
ran on the Nehalem with identical timings. Not only does this lend support to our 
approach, but it also suggests that the increased size of our AES-NI instructions 
set replacement is unlikely to have a significant effect. 
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Instructions Replacement for the Memory Variant 


The aesenc reg, [mem] replacement we propose is actually quite similar to the 
aesenc reg, reg one. The only difference lies in the simulation of the memory 
access: it shouldn’t impact the pop flows and, to accurately simulate aesenc 
reg, [mem], the corresponding pop should start at the same cycle as the first 
pop on port 0. This is why we chose to launch the memory access at the first 
mulps instruction: 

movdqu xmmz,, xmm; 

mulps xmm;, [mem] 

mulps xmm,, xmm; 

xorps xmm;, XMM, 


The validity of this replacement is assessed by the two following [ACA traces: 


Total Latency: 12 Cycles; Total number of Uops: 4 
Num of Ports pressure in cycles 
Uops O-DV[ 11l 2- DI 3- Bl 4] 5 
4 2 | | I 2to t2text xi | 1 | CP | aesenc xmmO, [0x6008f0] 
Total Latency: 11 Cycles; Total number of Uops: 5 
Num of Ports pressure in cycles 
Uops o-DVI 11 2- DI 3- DI 41 5 
1 x | ee | | | | | X movdqu xmm2, xmmO 
2 1 | | Lakai- Xi xi | CP | mulps xmm0, [0x6008f0] 
1 1 | | | | | | | | mulps xmm2, xmmi 
1 | | | | | | | | 1 | CP | xorps xmmO, xmm2 


An unfortunate side-effect of this replacement is that it affects an additional xmm 
register, putting additional constraints when avoiding false dependencies. This 
mainly concerns the ECHO and LANE algorithms. 


Equivalent Inverse Cipher 


The equivalent inverse cipher [8] allows for a decryption structure that is very 
similar to that of encryption. This is achieved by noticing that the straightfor- 
ward decryption algorithm 


InvShiftRows, InvSubBytes, AddRoundKey, InvMixColumns , 
can be replaced by the equivalent one 
InvSubBytes,  InvShiftRows, InvMixColumns, AddRoundKey , 


as the two first rounds commute and the last two commute when the key expan- 
sion is tweaked accordingly; decryption is now similarly structured to encryption: 


SubBytes, ShiftRows, MixColumns, AddRoundKey . 


An Inappropriate Replacement 


In this paragraph, we give the IACA trace for the pmuludq instruction. This 
shows that the replacement proposed in 3] is not appropriate as a generic 
aesenc replacement on the Nehalem architecture. In the trace below, pmuludq 
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has a latency of 3 cycles whereas the aesenc instruction has a latency of 6 cycles, 
so the two instructions behave differently. It is even worse at the pop level, as 
aesenc has 3 pops dispatched through ports 0 and 5 whereas pmuludq has a 
single pop dispatched on port 1: this will lead to very distinct behaviors, and 
almost certainly a different throughput. 


Total Latency: 3 Cycles; Total number of Uops: 1 

| Num of | Ports pressure in cycles | | 

| Uops | O-DV!] 1/1 2- DI 3= DI 41 5] l 

| 1 | | | 1] | | | | | | CP | pmuludg xmmO, xmm1 


This explains the differences in the performance of LESAMNTA derived in this 
paper and quoted at [IZ]. 
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Abstract. Group encryption (GE) schemes, introduced at Asiacrypt’07, 
are an encryption analogue of group signatures with a number of inter- 
esting applications. They allow a sender to encrypt a message (in the 
CCA2 security sense) for some member of a PKI group concealing that 
member’s identity (in a CCA2 security sense, as well); the sender is able 
to convince a verifier that, among other things, the ciphertext is valid 
and some anonymous certified group member will be able to decrypt the 
message. As in group signatures, an opening authority has the power of 
pinning down the receiver’s identity. The initial GE construction uses in- 
teractive proofs as part of the design (which can be made non-interactive 
using the random oracle model) and the design of a fully non-interactive 
group encryption system is still an open problem. In this paper, we give 
the first GE scheme, which is a pure encryption scheme in the standard 
model, i.e., a scheme where the ciphertext is a single message and proofs 
are non-interactive (and do not employ the random oracle heuristic). As 
a building block, we use a new public key certification scheme which 
incurs the smallest amount of interaction, as well. 


Keywords: Group encryption, anonymity, provable security. 


1 Introduction 


Group encryption (GE) schemes, introduced by Kiayias, Tsiounis and Yung B9, 
are the encryption analogue of group signatures [[6]. The latter primitives ba- 
sically allow a group member to sign messages in the name of a group without 
revealing his identity. In a similar spirit, GE systems aim to hide the identity of 
a ciphertext’s recipient and still guarantee that he belongs to a population of 
registered members in a group administered by a group manager (GM). A sender 
can generate an anonymous encryption of some plaintext m intended for a re- 
ceiver holding a public key that was certified by the GM (message security and 
receiver anonymity being both in the CCA2 sense). The ciphertext is prepared 
while leaving an opening authority (OA) the ability to “open” the ciphertext 
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(analogously to the opening operation in group signatures) and uncover the re- 
ceiver’s name. At the same time, the sender should be able to convince a verifier 
that (1) the ciphertext is a valid encryption under the public key of some group 
member holding a valid certificate; (2) if necessary, the opening authority will 
be able to find out who the receiver is; (3) (optionally) the plaintext is a witness 
satisfying some public relation. 


MOTIVATIONS. The GE primitive was motivated by various privacy applications 
such as anonymous trusted third parties or oblivious retriever storage. Many 
cryptographic protocols such as fair exchange, fair encryption or escrow encryp- 
tion, involve trusted third parties that remain offline most of the time and are 
only involved to resolve problems. Group encryption allows one to verifiably 
encrypt some message to such a trusted third party while hiding his identity 
among a set of possible trustees. For instance, a user can encrypt a key (e.g., in 
an “international key escrow system” ) to his own national trusted representative 
without letting the ciphertext reveal the latter’s identity, which could leak infor- 
mation on the user’s citizenship. At the same time, everyone can be convinced 
that the ciphertext is heading for an authorized trustee. 

Group encryption also finds applications in ubiquitous computing, where 
anonymous credentials must be transferred between peer devices belonging to 
the same group. Asynchronous transfers may require to involve an untrusted 
storage server to temporarily store encrypted credentials. In such a situation, 
GE schemes may be used to simultaneously guarantee that (1) the server retains 
properly encrypted valid credentials that it cannot read; (2) credentials have 
a legitimate anonymous retriever; (3) if necessary, an authority will be able to 
determine who the retriever is. 

By combining cascaded group encryptions using multiple trustees and accord- 
ing to a sequence of identity discoveries and transfers, one can also implement 
group signatures where signers can flexibly specify how a set of trustees should 
operate to open their signatures. 


PRIOR WORKS. Kiayias, Tsiounis and Yung (KTY) formalized the con- 
cept of group encryption and provided a suitable security modeling. They pre- 
sented a modular design of GE system and proved that, beyond zero-knowledge 
proofs, anonymous public key encryption schemes with CCA2 security, digital 
signatures, and equivocal commitments are necessary to realize the primitive. 
They also showed how to efficiently instantiate their general construction using 
Paillier’s cryptosystem (or, more precisely, a modification of the Camenisch- 
Shoup [13] variant of Paillier). While efficient, their scheme is not a single mes- 
sage encryption, since it requires the sender to interact with the verifier in a 
5/-protocol to convince him that the aforementioned properties are satisfied. In- 
teraction can be removed using the Fiat-Shamir paradigm (and thus the 
random oracle model [4]), but only heuristic arguments (see also [[4]) are 
then possible in terms of security. 

Independently, Qin et al. considered a closely related primitive with non- 
interactive proofs and short ciphertexts. However, they avoid interaction by 
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explicitly employing a random oracle and also rely on strong interactive assump- 
tions. As we can see, none of these schemes is a truly non-interactive encryption 
scheme without the random oracle idealization. 


OUR CONTRIBUTION. As already noted in various contexts such as anonymous 
credentials B], rounds of interaction are expensive and even impossible at times 
as, in some applications, proofs should be verifiable by third parties that are 
not present when provers are available. In the setting of group encryption, this 
last concern is even more constraining as it requires the sender, who may be 
required to repeat proofs with many verifiers, to maintain a state and remember 
the random coins that he uses to encrypt every single ciphertext. In the frequent 
situation where many encryptions have to be generated using independent ran- 
dom coins, this becomes a definite bottleneck. 

This paper solves the above problems and describes the first realization of 
group encryption which is a fully non-interactive encryption scheme with CCA2- 
security and anonymity in the standard model. In our scheme, senders do not 
need to maintain a state: thanks to the Groth-Sahai non-interactive proof 
systems, the proof of a ciphertext can be generated once-and-for-all at the same 
time as the ciphertext itself. Furthermore, using suitable parameters and for a 
comparable security level, we can also shorten ciphertexts by a factor of 2 in 
comparison with the KTY scheme. As far as communication goes, the size of 
proofs allows decreasing by more than 75% the number of transmitted bits be- 
tween the sender and the verifier. 

Since our goal is to avoid interaction, we also design a joining protocol (i.e., a 
protocol whereby the user effectively becomes a group member and gets his pub- 
lic key certified by the GM) which requires the smallest amount of interaction: 
as in the Kiayias-Yung group signature BO], only two messages have to be ex- 
changed between the GM and the user and the latter need not to prove anything 
about his public key. In particular, rewinding is not necessary in security proofs 
and the join protocol can be safely executed in a concurrent environment, when 
many users want to register at the same time. The join protocol uses a non- 
interactive public key certification scheme where discrete-logarithm-type public 
keys can be signed as if they were ordinary messages (and without knowing the 
matching private key) while leaving the ability to efficiently prove knowledge 
of the certificate/public key using the Groth-Sahai techniques. To certify users 
without having to rewind; in security proofs, the KTY scheme uses groups of 
hidden order (and more precisely, Camenisch-Lysyanskaya signatures [[2]). In 
public order groups, to the best of our knowledge, our construction is the first 
certification method that does not require any form of proof of knowledge of 
private keys. We believe it to be of independent interest as it can be used to 
construct group signatures (in the standard model) where the joining mecha- 
nism tolerates concurrency in the model of without demanding more than 
two moves of interaction. 


Although the simulator does not need to rewind proofs of knowledge in B9], users 
still have to interactively prove the validity of their public key. 
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ORGANIZATION. In section B| we describe the intractability assumptions that 
we need and recall the KTY model of group encryption. Section [3] explains 
the building blocks of our construction and notably describes our certification 
scheme. Our GE system is depicted in section W 


2 Background 


In the paper, when S is a set, x È S denotes the action of choosing x at random 
in S. By a € poly(A), we mean that a is a polynomial in ÀA while b € negl(\) says 
that b is a negligible function of À. When a and b are two binary strings, a||b 
stands for their concatenation. 


2.1 Complexity Assumptions 


We use groups (G, Gr) of prime order p with an efficiently computable map 
e: Gx G > Gr such that e(g*, h?) = e(g,h)® for any (g,h) €G x G,a,beZ 
and e(g,h) # 1g, whenever g,h Æ lg. 

In this setting, we rely on an assumption introduced in [7] that allows con- 
structing efficient non-interactive proofs as pointed out in [27]. 


Definition 1. The Decision Linear Problem (DLIN) in G, is to distinguish 
the distribution Dı = {(g,9%, 9°, 9%, 9°%, 9°t)|a, b, c, d È Zi} from the distri- 
bution Da = {(g,9%,9°,9°%°, 9°", 9*)\a,b,c,d,z S Z*}. The Decision Linear 
Assumption is the intractability of DLIN for any PPT algorithm D. 


This problem amounts to deciding whether vectors gi = (g,1, 9), g = (1,9°,9) 
and g3 are linearly dependent or not. We also consider a related computational 
problem which bears similarities with simultaneous pairing problems [26/25]. 


Definition 2. The Simultaneous Double Pairing problem (S2P) in G is, 
given (91, 92; 91,c; 92,4) € G*, to find a triple (u,v, w) € G3\{(1e, Le, 1e)} such 
that e(g1,u) = e(gi,c, w) and e(g2, v) = e(g2,a, W). 

Like the simultaneous triple pairing assumption [25], the hardness of this prob- 


lem is implied by the DLIN assumption: given (g, 91, 92,97, 99,7 as g°t*) any 


algorithm that, on input of (91, 92, g$, g4), outputs a non-trivial (u,v, w) such 
that e(g1,u) = e(gf, w), e(g2,v) = e(g%, w) allows telling whether ņn = g°t? by 
testing if e(g,u-v) = e(n, w) (since u = w° and v = wf). 

We also use the Hidden Strong Diffie-Hellman (HSDH) assumption introduced 
in as a strengthening of the Strong Diffie-Hellman assumption [6]. 


Definition 3. The (-Hidden Strong Diffie-Hellman problem (¢-HSDH) in 
G is, given (g, R = g”,u) È G? and triples (g!/(t8), g“ u“) with c1,...,c¢ © 
Zi, to find another triple (g/t), g°,u°) such that c £ ci fori=1,...,2. 

We finally need the following variant of the Diffie-Hellman assumption. 
Definition 4. The Flexible Diffie-Hellman problem (FlexDH) is, given 
(g,9%, 9°) € G3, where a,b È Z;,, to find a triple (C, C°, C®) such that C # 1g. 
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A potentially easier problem considered in only requires to output (C,C®) 
on input of the same values. The latter problem was proved generically hard in 
prime order groups [83]. In bilinear groups, any algorithm solving either of these 
two problems would make it easy to recognize g*° on input of (g,g%, 9", 9°), 
which is a problem suggested for the first time in [$] Section 8]. 


2.2 Model and Security Notions 


Group encryption schemes involve a sender, a verifier, a group manager (GM) 
that manages the group of receivers and an opening authority (OA) that is 
able to uncover the identity of ciphertext receivers. A group encryption system 
is formally specified by the description of a relation R as well as a collection 
GE = (SETUP, JOIN, (Gr, R, sampler), ENC, DEC, OPEN, (P,V)) of algorithms 
or protocols. Among these, SETUP is a set of initialization procedures that all 
take (explicitly or implicitly) a security parameter A as input. They can be split 
into one that generates a set of public parameters param (a common reference 
string), one for the GM and another one for the OA. We call them SETUPinit (A), 
SETUPgm(param) and SETUPoa(param), respectively. The latter two procedures 
are used to produce key pairs (pkgy,skem), (pkoa; skoa) for the GM and the OA. 
In the following, param is incorporated in the inputs of all algorithms although 
we sometimes omit to explicitly write it. 

JOIN = (Juser; Jem) is an interactive protocol between the GM and the prospec- 
tive user. As in BO], we will restrict this protocol to have minimal interaction and 
consist of only two messages: the first one is the user’s public key pk sent by Juser 
to Jem and the latter’s response is a certificate certp, for pk that makes the user’s 
group membership effective. We do not require the user to prove knowledge of his 
private key sk or anything else about it. In our construction, valid keys will be 
publicly recognizable and users do not need to prove their validity. After the exe- 
cution of JOIN, the GM stores the public key pk and its certificate certpk in a public 
directory database. 

Algorithm sample allows sampling pairs (x,w) € R (made of a public value 
x and a witness w) using keys (pkz,skz) produced by Gy. Depending on the 
relation, skp may be the empty string (as will be the case in our scheme). The 
testing procedure R(x, w) returns 1 whenever (x, w) E€ R. To encrypt a witness 
w such that (x, w) € R for some public x, the sender fetches the pair (pk, certpx) 
from database and runs the randomized encryption algorithm. The latter takes 
as input w, a label L, the receiver’s pair (pk, certpk) as well as public keys pkey 
and pko,. Its output is a ciphertext y — ENC(pkgyy, pkoa, pk, certpk, w, L). On 
input of the same elements, the certificate certp,, the ciphertext 7 and the ran- 
dom coins coins, that were used to produce it, the non-interactive algorithm 
P generates a proof my that there exists a certified receiver whose public key 
was registered in database and that is able to decrypt w and obtain a witness w 
such that (x, w) € R. The verification algorithm V takes as input Y, pkgy, Pkoa, 
Ty and the description of R and outputs 0 or 1. Given 7, L and the receiver’s 
private key sk, the output of DEC is either a witness w such that (x, w) E€ R or a 
rejection symbol L. Finally, OPEN takes as input a ciphertext/label pair (w, L) 
and the OA’s secret key skoa and returns a receiver’s public key pk. 
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The security model considers four properties termed correctness, message 
security, anonymity and soundness. In the following, we sometimes denote by 
(output 4 |outputg) — (A(input,), B(input,))(common-input) the execution of a 
protocol between A and B obtaining their own outputs from their inputs. 


CORRECTNESS. The correctness property requires that the following experiment 
returns 1 with overwhelming probability. 


correctness (A 


Experiment Expt 
param — SETUPinit(A); (pkr, skr) — Gr(A); (x, w) — sampler (pkr, skr ); 
(pkem,skem) — SETUPem(param); (pkoa, skoa) — SETUPoa(param); 

(pk, sk, certpk|pk, certpk) — (Juser; Jem(skem)) (Pkem); 

p — ENC(pkeu, Pkoa, Pk, certpk, w, L); 

Ty — P(pkgm, Pkoa, pk, cert, w, L, Y, coinsy); 

If ((w # DEC(sk, w, L)) V (pk # OPEN (skoa, wv, L)) 
V(V (Y, L, Ty, Pkem; PKoa) = 0)) return 0 else return 1; 


MESSAGE SECURITY. The message secrecy property is defined by an experiment 
where the adversary has access to oracles that may be stateful (and maintain a 
state across queries) or stateless: 


- DEC(sk): is a stateless oracle for the user decryption function DEC. When 
this oracle is restricted not to decrypt a ciphertext-label pair (w,L), we 
denote it by DEC7'?. 

- CH? (A, pk, w, L): is a real-or-random challenge oracle that is only queried 
once. It returns (Y, coins) such that  — ENC(pkgiy, pkoa, pk, certpk, w, L) 
if b = 1 whereas, if b = 0, Yy — ENC(pkgy, Pkoa, pk, certpx, w’, L) encrypts a 
random plaintext uniformly chosen in the space of plaintexts of length O(A). 
In either case, coins, are the random coins used to generate W. 

- PROVE® p (Pkem; Pkoa, Pk, certpk, Pkr, x, w, Y, L, coinsy): is a stateful ora- 
cle that the adversary can query on multiple occasions. If b = 1, it runs the 
real prover P on the inputs to produce an actual proof my. If b = 0, the 
oracle runs a simulator P’ that uses the same inputs as P except witness 
w, cCoinsy and generates a simulated proof. 


These oracles are used in an experiment where the adversary controls the GM, 
the OA and all members but the honest receiver. The adversary A is the dishon- 
est GM that certifies the honest receiver in an execution of JOIN. She has oracle 
access to the decryption function DEC of that receiver. At the challenge phase, 
she probes the challenge oracle for a label and a pair (x, w) € R of her choice. 
After the challenge phase, she can also invoke the PROVE oracle on multiple 
occasions and eventually aims to guess the bit b chosen by the challenger. 

As pointed out in [29], designing an efficient simulator P’ (for executing 
PROVE p (.) when b = 0) is part of the security proof and might require a 
simulated common reference string. 


Definition 5. A GE scheme satisfies message security if, for any PPT adver- 
sary A, the experiment below returns 1 with probability at most 1/2 + negl(A). 
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Experiment Expt“ (A) 
param — SETUPinit(A); (aux, pkey, Pkoa) — A(param); 
(pk, sk, certpkļaux) — (Juser, A(aux)}(pkgm); 
(aux, z, w, L, pkg) — APEC.) (aux); If (a, w) Z R return 0; 
b È {0,1}; (a, coinsy) — CH®,,(A, pk, w, L); 
bf APROVE „pr (Pkem:Pkoa:Pkicerta Pkr 2,10 sL,coinsy) DEC) (sks) (ayy qp); 


If b= b return 1 else return 0; 


ANONYMITY. In anonymity attacks, the adversary controls the whole system but 
the opening authority and performs a kind of chosen-ciphertext attack on the 
encryption scheme of the OA. She registers two keys pkg, pk, in database and, for 
a pair (x, w) E R of her choosing, obtains an encryption of w under pk, for some 
b € {0,1} chosen by the challenger. She is granted access to decryption oracles 
w.r.t. both keys pko, pk,. In addition, she may invoke the following oracles: 


- CHE on(Pkem; Pkoa; Pko, pki, w, L): is a challenge oracle that is only queried 
once by the adversary. It returns a pair (Y, coins,,) consisting of a ciphertext 
w — ENC(pkgyy, Pkg; Pky, Certpk,, W, L) and the coin tosses coins, that were 
used to generate w. 

- USER(pkgy): is a stateful oracle simulating two executions of Juser to intro- 
duce two honest users in the group. It uses a string keys where the outputs 
of the two executions are written. 

- OPEN(skoa,.): is a stateless oracle that simulates the opening algorithm on 
behalf of the OA and, on input of a GE ciphertext, returns the receiver’s 
public key. 


Definition 6. A GE scheme satisfies anonymity if, for any PPT adversary A, 
the experiment below returns 1 with a probability not exceeding 1/2 + negl(A). 


Experiment Expt% ”(A) 
param — SETUPinit(A); (pkoa, skoa) — SETUPoa(param); 
(aux, pkg) a A(param, pkoa); aux — „AUSER(pkem),OPEN(skoa,.) ( 
If keys # (pko, sko, certpk, , Pky, Ski, Certpk, ) (aux) return 0; 
(aux, z,w, L, pkg) ee AOPEN(skoa,.),DEC(sko,.),DEC(sk1-) (aux) ; 
If (x, w) Z R return 0; 
b Š {0, 1}; (Y, coinsy) = CHE non (Pkem> Pkoa; Pko, pk,, Ww, L); 
b! — AP (Pkem:Pkoa:Pko:Certpk, :£,Ww, p, L, coinsy, 

OPEN” (H) (skoa,.),DEC™ =) (sko,.),DEC? =) (ski,.)) (aux, Y); 


aux); 


If b =b return 1 else return 0; 


As shown in [29], GE schemes satisfying the above notion necessarily subsume a 
key-private (a.k.a. receiver anonymous) B8] cryptosystem. 


SOUNDNESS. In a soundness attack, the adversary creates the group of receivers 
by interacting with the honest GM. Her goal is to produce a ciphertext Y% and a 
convincing proof that w is valid w.r.t. a relation R of her choice but either (1) 
the opening reveals a receiver’s public key pk that does not belong to any group 
member; (2) the output pk of OPEN is not a valid public key (i.e., pk ¢ PK, 
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where PK is the space of valid public keys); (3) the ciphertext C is not in the 
space C®:/Pkr:Pkem Pkoa-Pk of valid ciphertexts. This notion is formalized by a game 
where the adversary is given access to a user registration oracle REG(skem, .) 
that simulates Jgm. This oracle maintains a repository database where registered 
public keys and their certificates are stored. 


Definition 7. A GE scheme is sound if, for any PPT adversary A, the experi- 
ment below returns 1 with negligible probability. 


Experiment Expt*.¢"@"°(.) 


param — SETUPinit(A); (pkoa; skoa) — SETUPoa(param); 

(pkgm; skem) — SETUPem (param); 

(pkr, £, Y, Ty, L, aux) — AREG(skem;-) (param, pkem; Pkoa, skoa); 

If V(w, L, ty, pkem; PKoa) = 0 return 0; 

pk — OPEN(skoa, Y, L); 

If ((pk ¢ database) V (pk ¢ PK) V (w ¢ C2 L:Pkr:Pkem:Pkoa:PK) ) 
then return 1 else return 0; 


2.3 Groth-Sahai Proof Systems 


In the following notations, for equal-dimension vectors A and B containing group 
elements, A© B stands for their component-wise product. 

When based on the DLIN assumption, the Groth-Sahai (GS) proof systems 
use a common reference string comprising vectors gi, 92,93 € G3, where 
gi = (91,1,9), R = (1, 92,9 9) for some 91592 E G. To commit to X € G, one 
sets Č = (1,1, X) Og)" © Rê Og! with r,s,t È Z. When the pon system is 
configured to give perfectly sound proofs, g3 is dice as 93 = ite ‘Oo RE with 
&, 2 Š Zs. Commitments Č = (gi Tt, g3t$2t, X. g7 ts+tE1+82)) are then Boneh- 
Boyen-Shacham (BBS) ciphertexts that can be decrypted using aı = log,(g1), 
a2 = log,(g2). In the witness indistinguishability (WI) setting, vectors gi, 92, 93 
are linearly independent and Cisa perfectly hiding commitment. Under the 
DLIN assumption, the two kinds of CRS are indistinguishable. 

To commit to an exponent x € Zp, one computes C = g” © gj’ © 92°, with 
r,s Z* p using a CRS comprising vectors Ø, gi, g2. In the soundness setting 
Z, gi, 92 are linearly independent vectors (typically Ø = g3 © (1,1,g) where g = 
fis © 93°") whereas, in the WI setting, choosing g = gi © gê gives a perfectly 
hiding commitment since C is always a BBS encryption of lg. 

To prove that committed variables satisfy a set of relations, the GS techniques 
replace variables by the corresponding commitments in each relation. The whole 
proof consists of one commitment per variable and one proof element (made of 
a constant number of group elements) per relation. 

Such proofs are available for pairing-product relations, which are of the type 


ie (Ai, Xi) Tl: ie (Xj, 45) = tr, 
i=l 11. j=l 
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for variables X1,...,¥n E G and constants tr € Gr, Ai,...,An E€ G, aij E€ G, 


for i, j € {1,...,n}. Efficient proofs also exist for multi-exponentiation equations 
m n m n 
Yi bj YiYij __ 
HA Mer LE” =T, 
i=1 j=1 i=1 j=l 


for variables X1,...,¥n € G, y1,---,Ym E Zp and constants T, A1,..., Am E G, 
bi,...,0n € Zp and yi; E G, for i € {1,...,m},7 € {1,...,n}. 

Multi-exponentiation equations admit zero-knowledge proofs at no additional 
cost. On a simulated CRS (prepared for the WI setting), a trapdoor makes it is 
possible to simulate proofs without knowing witnesses and simulated proofs are 
perfectly indistinguishable from real proofs. As for pairing-product equations, 
zero-knowledge proofs are often possible but usually come at some expense. In 
the paper, we only resort to such NIZK simulators in one occasion. 

In both cases, proofs for quadratic equations cost 9 group elements. Linear 
pairing-product equations (when a;; = 0 for all i,j) take 3 group elements 
each. Linear multi-exponentiation equations of the type Mi- x” = T (resp. 
Tl, A” = T) demand 3 (resp. 2) group elements. 


3 Building Blocks 


Our certification scheme uses a trapdoor commitment to group elements as an 
important ingredient to dispense with proofs of knowledge of users’ private keys. 


3.1 A Trapdoor Commitment to Group Elements 


We need a trapdoor commitment scheme that allows committing to elements of 
a group G where bilinear map arguments are taken. Commitments will have to 
be themselves elements of G, which prevents us from using Groth’s scheme 
where commitments lie in the range Gr of the pairing. 

Such commitments can be obtained using the perfectly hiding Groth-Sahai 
commitment based on the linear assumption recalled in section 2.3] This com- 
mitment uses a common reference string describing a prime order group G and 
a generator f € G. The commitment key gee > vectors ( fis h, fs) chosen as 


F = (f;1, f), fo = (1, fa, f) and fs = A ‘oR © (1,1, f)®, with fi, f2 S G, 
E, £2,683 È Zp. To commit to X, the sender picks ¢1, $2, 63 & Z% and sets 


Cx = (1, LOA” oh” or, which, if fs is parsed as (f3.1, 3,2, f3,3), can 
be written Cy = (ff -f$%, SI>- £23, X- fttt. f3). Due to the use of GS proofs, 
commitment openings need to only consist of group elements (and no scalar). To 
open Cy = (C1, C2, C3), the sender reveals (D1, D2, D3) = (f*, f?2, f?%) and 
X. The receiver is convinced that the committed value was X by checking that 


e(Ci, f) = e( fi, D1) + el faa, D3) 
e(C2, f) = e( f2, D2) - e( f3,2, D3) 
e(C3, f) = e(X - Dy - Da, f) - e( f3,3, D3). 
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If a cheating sender can come up with distinct openings of Cx we can easily 
solve a $2P instance (91, 92, 91,c; 92,4). Namely, the commitment key is set as 
(fi, fo, f3,1, f3,2) = (91,925 91,c 92,4) and f, f3,3 are chosen at random. When 
the adversary outputs (X, (D1, D2, D3)) and (X’, (D1, D4, D)), we must simul- 
taneously have elfi, Dı/D}) = e( fsa, D3/D3), e( fo, D2/ D3) = e(fs.2, D3/D3) 
and e((X DıD2)/(X'D1 D4), f) = e(fs,3, D3/D3). Hence, setting u = D,/D}, 
v = D/D, and w = D3/Dz3 solves the S2P problem as (u,v, w) can only be 
trivial if X’ = X. 

Using the trapdoor (&1,&2,&3), the receiver can equivocate commitments. 
Given a commitment Cy and its opening (X, (D1, D2, D3)), one can trapdoor 
open Cx to any other X’ € G (and without knowing log,(X’)) by computing 


Di = Di (X'/X)/8, = Dh = Da: (X'/X)8/88, sD = (X/X')" S . D3. 


3.2 A Public Key Certification Scheme 


We use a primitive that we call non-interactive certification scheme, which can 
be viewed as a signature scheme that only allows signing public keys from a 
specific public key space PK. These keys should be signed while retaining alge- 
braic properties that make it possible to prove knowledge of a public key and its 
corresponding certificate in an efficient way. In particular, signing hashed public 
keys is proscribed. In the interactive setting, several papers (e.g., BEJ) describe 
efficient interactive protocols where a public key is jointly generated by a user 
and a certification authority in such a way that the user eventually obtains a 
certified public key and no one else learns the underlying private key. In this pa- 
per, we aim at minimizing the amount of interaction and let users generate their 
public key entirely on their own before requesting their certification. Ideally, we 
would like to be able to sign public keys without even requiring users to prove 
knowledge of their private key and, in particular, without having to first rewind 
a proof of knowledge so as to extract the user’s private key in the security proof. 

A certification scheme consists of algorithms (Setup, Certify, CertVerify). The 
first one is run by a certification authority (CA) that, on input of global param- 
eters cp, generates a key pair (SK, PK) — Setup(cp). On input of cp, SK and 
a user’s public key pk, Certify generates a certificate certp,. The procedure Verify 
takes as input cp, PK, pk and cert,, and outputs either 0 or 1. 

Correctness mandates that CertVerify(cp, PK, pk, certp,) = 1 when certpk — 
Certify(cp, SK, pk). The (strong) unforgeability [I] requirement is the same as in 
signature schemes. The adversary is supplied with a CA’s public key PK and 
access to a certification oracle Certify(SA,.) that can be queried for arbitrary 
public keys pk € PK. Her goal is to produce a new pair (pk", certšķ«) (i.e., if pk” 
was queried to Certify( SK, .), the output must have been different from certs»). 

In the description hereafter, we assume common public parameters cp consist- 
ing of of bilinear groups (G, Gr) of prime order p > 2%, for a security parameter 
A, and a generator g Š G. We also assume that certified public keys always 
consist of a fixed number n of group elements (i.e., PK = G”). 
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INTUITION. The scheme borrows from the Boyen-Waters group signature 
in the use of the HSDH assumption. A simplified version involves a CA that 
holds a public key PK = (Q = g”, A = (g,g)%,u, Uo, 1 = g,...,Un = g?"), 
for private elements SK = (w,a,(1,...,8n), where n denotes the number of 
groups elements that certified public keys consist of. To certify a public key 
pk = (X, = g™,...,Xn = g7”), the CA chooses an exponent cip S Z% and 
computes Sı = (g%)!/(+em), S, = g, $3 = u%™, S4 = (uo - [i] XP) 
and S5 = (951,..-,55.n) = (X7™,...,X@°). Verification then checks whether 
e(S1, 2+ S2) = A and e(S2, u) = e(g, $3) as in [0]. It must also be checked that 
e(S4, g) = e(uo, S2) - []j_, elui, S54) and e($5,i,9) = e(Xi, S2) for i =1,...,n. 

The security of this simplified scheme can only be proven if, when answering 
certification queries, the simulator can control the private keys (x1,...,%n) and 
force them to be random values of its choice. To allow the simulator to sign ar- 
bitrary public keys without knowing the private keys, we modify the scheme so 
that the CA rather signs commitments (calculated as in the trapdoor commit- 
ment of section B,J) to public key elements X1, ..., Xn. In the security proof, the 
simulator first generates a signature on n commitments C,= (Ci, 1, Ci,2, Ci,3) to 
1g that are all generated in such a way that it knows log,(Cj,;) for i=1,...,n 
and j = 1,2,3. Using the trapdoor of the commitment scheme, it can then open 
Č; to any arbitrary public key element X; without knowing log,(X;). 

This use of the trapdoor commitment is reminiscent of a technique (no- 
tably used in [I8]) to construct signature schemes in the standard model using 
chameleon hash functions [BJ]: the simulator first signs messages of its choice 
using a basic signature scheme and then “equivocates” the chameleon hashes to 
make them correspond to adversarially-chosen messages. 


Setup(cp): given common public parameters cp = {g, G, Gr}, select u, uo S 
G, aw © Z* and set A = e(g,g)%, Q = g”. Pick 6:1, bi2, bis Š Z3 
and set T; = (Ui, Ui,2, Ui) = (git, g2, g®8) for i = 1,...,n. Choose 
Í, fa, f2, faa, f3,2; 3,3 = G that define a commitment key consisting of vec- 


tors fi = (fi,1,f), R = (1, fa, f) and_fs = (fs,1, f3,2, fs,3)- Define the 
private/public key pair as SK = (a,w, {3, = (8:1, Gi,2, 3:3) }i=1,....n) and 


PK = (r= (fi, fa, fa); A= e(g, g)“, Q = g”; u, UO; aleis): 


n), pk as (X1,..., Xn) and do 


perg 


the following. 

1. For each i € {1,..., n}, pick 6:1, ¢i,2, ¢1,3 © Zy and compute a commit- 
ment C; = (Ci,1, Ci,2, Ci,3) = (fi T a f Xi forto. £05) 
and the matching de-commitment (D; 1, Di,2, Di3) = (f?", fe, 3). 

2. Choose cip È Zp, compute Sı = (go) i/eten), Sp = 9, S3 =u and 

n 
. a : CID 
Sa = (uo: T] (CR -ofz of) 
i=1 


Ss = { (55,1; 95,12 55,1,3) }ict,...n = (CPP, C73, C73) Ji=1,.n 
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Return cert=k (Cia, Ci2,Ci,3), (Di, Dia, Diz) ji=1,...,n; 91, S2, 93, S4, 53 . 


CertVerify(cp, PK, pk, cert,,): parse pk as (X1,..., Xn) and certpk as above. Re- 
turn 1 if, fori =1,...,n, it holds that X; € G and 


e(Cya5 f) = el fis Di1): e(f3,1, Di3) (1) 
e(Ci 2, f) = e( f2, Di2) - e(f3,2, Diz) (2) 
e(Ci 3, f) = e(Xi - Dia - Di2, f)- e(f3,3, Di3), (3) 


and if the following checks are also satisfied. Otherwise, return 0. 


e(S1, R- S2) = A (4) 

e(S2,u) = e(g, S3) (5) 

e(S4, g) = e(uo, S2) - JI (e(ui,1, 95,1,1) + e(Ui,2, 95,i,2) - elui,3, 95,1,3)), (6) 
Ei 

€($51,5,9) = e(Ci j, S2) for i= Lo seii J= 1,2;3 (7) 


A certificate comprises 9n + 4 group elements. It would be interesting to avoid 
this linear dependency on n without destroying the algebraic properties that 
render the scheme compatible with Groth-Sahai proofs. 

Regarding the security of this scheme, the idea of the proof of the following 
theorem is sketched in appendix [A] Due to space limitation, the complete proof 
is detailed in the full version of the paper. 


Theorem 1. The scheme is a secure non-interactive certification system if the 
HSDH, FlexDH and S2P problems are all hard in G. 


We believe that the above certification scheme is of interest in its own right. 
For instance, it can be used to construct non-frameable group signatures that 
are secure in the concurrent join model of without resorting to random 
oracles. To the best of our knowledge, the Kiayias- Yung construction has 
remained the only scalable group signature where joining supports concurrency 
at both ends while requiring the smallest amount of interaction. In the standard 
model, our certification scheme thus appears to provide the firs way to achieve 
the same result. In this case, we have n = 1 (since prospective group members 
only need to certify one group element if non-frameability is ensured by signing 
messages as in Groth’s group signature BJ) so that membership certificates 
comprise 13 group elements and their shape is fully compatible with GS proofs. 


? Non-frameable group signatures described in achieve concurrent security by 
having the prospective user generate an extractable commitment to some secret 
exponent (which the simulator can extract without rewinding using the trapdoor of 
the commitment) and prove that the committed value is the discrete log. of a public 
value. In the standard model, this technique requires interaction and the proof should 
be simulatable in zero-knowledge when proving security against framing attacks. 
Another technique BI] requires users to prove knowledge of their secret exponent 
using Groth-Sahai non-interactive proofs. It is nevertheless space-demanding as each 
bit of committed exponent requires its own extractable GS commitment. 
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3.3 Public Key Encryption Schemes Based on the Linear Problem 


We need cryptosystems based on the DLIN assumption. The first one is 
Shacham’s variant [37] of Cramer-Shoup [IA and, since it is key-private [8], 
we use it to encrypt witnesses. We also use Kiltz’s tag-based encryption (TBE) 
scheme BIJ, where the validity of ciphertexts is publicly verifiable, to encrypt 
receivers’ public keys under the public key of the opening authority. 


SHACHAM’S LINEAR CRAMER-SHOUP. If we assume public generators g1, 92,9 
that are parts of public parameters, each receiver’s public key is made of n = 6 
group elements 


X= 919" X3 = gig” X5 = g1” g" 

Xa = g3” g" X4 = g5*g” Xe = g5° g". 
To encrypt m € G under the label L, the sender picks r, s È Zy and computes 
tics = (U1, U2, Ua, Un, U5) = (gf, 93, 97+, m- XEXG, AXP (XD), 
where a= H (U1, U2, U3, U4, L) € Z% is a collision-resistant hast} Given (wcs, L), 


the receiver computes a. He returns L if Us 4 Up tors ygetorayy toy and 
m = U4/(UF5 U36 UZ) otherwise. 


KILTz’s TAG-BASED ENCRYPTION SCHEME. In BJ], Kiltz described a TBE 
scheme based on the same assumption. The public key is (Y1, Y2, Y3, Y4) = 
(g¥, g¥, g¥, 9) if g € G is part of public parameters. To encrypt m € G 
under a tag t € Z5, the sender picks w1, w2 È Zp and computes 


tk = (Vi, Va, Va, Va, Va) = (Wi, Y3, (g'¥a)"™, (GY, m gin") 


To decrypt Wx, the receiver checks that V3 = AL Y= yore, If so, 
it outputs the plaintext m = V5/ (vi! ee a ”) Unlike wcs, the well-formedness 
of wx is publicly verifiable in bilinear groups. The Canetti-Halevi-Katz 
paradigm turns this scheme into a full-fledged CCA2 scheme by deriving the 
tag t from the verification key VK of a one-time signature, the private key SK of 
which is used to sign (V1, V2, V3, Va, Vs). 


4 A GE Scheme with Non-interactive Proofs 


We build a non-interactive group encryption scheme for the Diffie-Hellman re- 
lation R = {(X,Y),W} where e(g,W) = e(X,Y), for which the keys are 
pkr = {G, Gr, g} and skr =e. 

The construction slightly departs from the modular design of in that com- 
mitments to the receiver’s public key and certificate are part of the proof (instead 
of the ciphertext), which simplifies the proof of message-security. The security 
of the scheme eventually relies on the HSDH, FlexDH and DLIN assumptions. 
All security proofs are available in the full version of the paper. 


3 The proof of CCA2-security [B7] only requires a universal one-way hash function 
(UOWHF) BJ but collision-resistance is required by the proof of key-privacy in [3]. 
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SETUP nit(A): choose bilinear groups (G,Gr) of order p > 2", g & G and 
gı = 9™, g2 = g% with a1, @2 S Zy. Define gi = (91,1,9), 92 = (1, 92,9) 
and g3 = jF © pë with &,f © Z* > Which form a CRS g = a: 92, 93) 
for the perfect soundness setting. Select a strongly unforgeable (as defined 
in [Ī]) one time signature scheme X = (G,S,V) and a random member 
H : {0,1}* — Zp of a collision-resistant hash family. Public parameters 
consists of param = {A,G, Gr, g,g, X, H}. 


SETUPem(param): runs the setup algorithm of the certification scheme de- 
scribed in section with m = 6. The obtained public key consists of 


Pkom = C A = elgg)”, 2 = g”, u; Uo, {ihs 6) and the match- 


peony 


ing private key is skm = (a,w, {8; = (bi, 1, 91,2, bi,3) }i=t,..., 6). 
SETUPoa(param): generates pkoa = (Y1, Y2, Y3, Ya) = (94, 9%, 9%, 9%), as a 
public key for Kiltz’s tag-based encryption scheme [31], and the correspond- 
ing private key as skoa = (y1, Y2, Y3, Y4). 
JOIN: the user sends a linear Cramer-Shoup public key pk = (X1,..., X6) € G® 
to the GM and obtains a certificate 


certpk = ({(Ci1, Ci,2, Cia), (Di1; Di,2, Di.) }i=1,...,6, $1, 92, 53, S4, Ss). 


ENC(pkem; Pkoa, pk, certpk, W, L): to encrypt W € G such that ((X, Y), W) ER 
(for public elements X,Y € G), parse pkgy, Pkoa and pk as above and do 
the following. 


1. Generate a one-time signature key pair (SK, VK) — G (A). 

2. Choose r,s Š Zy and compute a linear CS encryption of W, the result 
of which is denoted by wes, under the label Lı = L||VK as per section 
(and using the collision-resistant hash function specified by param). 

3. Fori=1,...,6, choose wi,1, wi,2 na Zy and encrypt X; under pkoa using 
Kiltz’s TBE with the tag VK as described in section B3]. Let Yk, be the 
ciphertexts. 

4. Set the ciphertext w as Y = VK\|Wcs||Wx,||--- IlYks||e where ø is ob- 
tained as ø = S(SK, (wes||¥x Il- =- lixe ll). 


Return (Y, L) and coinsy consist of {(wi,1, wi,2)}i=1,...,6, (r, 8). If the one- 
time signature of [23] is used, VK and o take 3 and 2 group elements, 
respectively, so that Y% comprises 40 group elements. 


P (pkem; Pkoa; Pk, certpk; (X,Y), W, Y, L, coinsy): parse pkey, Pkoa, pk and w 
as above. Conduct the following steps. 


1. Generate commitments (as explained in section 2.3) to the 9n + 4 = 58 
group elements that certpk consists of. The resulting overall commitment 
coMeert,, contains 184 group elements. 

2. Generate commitments to the public key elements pk = (X1, . . . , X6) and 
obtain comp, = {comx; }i=1,...,6, which consists of 18 group elements. 

3. Generate a proof Teert ps that coMecert,, 1S a commitment to a valid cer- 
tificate for the public key contained in comp. For each i = 1,...,6, 
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relations (1)-(@) cost 9 elements to prove (and thus 54 elements alto- 
gether). The quadratic equation (@) takes 9 elements and linear ones 
()-@ both require 3 elements. Finally, (Z is a set of 18 linear equa- 
tions which demand 54 elements altogether. The whole proof Tcertpk thus 
takes 123 group elements. 

4. For i = 1,...,6, generate a NIZK proof Teq-key,i that comx, (which 
is part of compk) and Wx, are encryptions of the same X;. If Yk; com- 
prises (V; 1, Vio, Vis) = (YE , Yo"? Xi g+ +2) and comx, is parsed 
as (Cxin5 Oxia CRs) = Cag ree g5? ci Xi ` g LTO: ` 93'3), where 

Wi 1, Wi,2 E COUN Sy, 0i1, 92, 043 E Zp and 93 = (93,1; 93,2; 93,3), this 

amounts to prove knowledge of values wi, Wi,2, 9:1, 9:2, 9i3 such that 


_ Wi,1 —0i1 —0i3 
= (Yi "9° 9310 


? 


( Via Vie Vis ) 
ae ah 
CX; CXing CXi3 


yoe gr” T gvir twi 20i 0i2 Goa): 
Committing to wy;,1, Wi,2,9i1,%:2,0:3 introduces 90 group elements 
whereas the above relations only require two elements each. Overall, 
proof elements Tegq-key,1; - - - , Teq-key,6 incur 126 elements. 

5. Generate a NIZK proof Tval-enc that Yes = (U1, U2, U3, U4, Us) is a valid 
CS encryption. This requires to commit to underlying encryption ex- 
ponents r,s € coinsy and prove that U; = gf, U2 = g§, Us = g"** 
(which only takes 3 times 2 elements as base elements are public) and 
Us = (X1 X$)" (X2X 7) (which takes 9 elements since base elements are 
themselves variables). Including commitments com, and com, to expo- 
nents r and 8, Tyal-enc demands 21 group elements overall. 

6. Generate a NIZK proof mr that wcs encrypts a group element W € G 
such that ((X,Y),W) € R. To this end, generate a commitment comy = 
(cw.1, cw.2, ew.3) = (9? ue 9? ce W. ggs) and prove that the 
underlying W is the same as the one for which U4 = W - X2.X@ in vcs. 
In other words, prove knowledge of r, s, 01,02,03 such that 


( Ui U2 U4 ) _ (a ee) 
cw,’ Cw,2’ cw,3 


g3- g3, gg XE- XE). (8) 


Commitments to r, s are already part of Tval-enc. Committing to 01, 02, 03 
takes 9 elements. Proving the first two relations of (8) requires 4 elements 
whereas the third one is quadratic and its proof is 9 elements. Proving 
the linear pairing-product relation e(g, W) = e(X,Y) in NIZKG demands 
9 elements. Since mr includes comy, it entails a total of 34 elements. 


4 It requires to introduce an auxiliary variable Y and prove that e(g, W) = e(¥,Y) 
and ¥ = X, for variables W, ¥ and constants g, X, Y. The two proofs take 3 elements 
each and 3 elements are needed to commit to ¥. 
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The proof Ty = COMeert yy, ||COMpk| | Teert,y,|| Teq-key,1|| sed || Teq-key,6||Tval-enc| [TR 
eventually takes 516 elements. 


V(param, Y, L, Ty, Pkgm, PKoa): parse pkey, Pkoa, pk, Y and Ty as above. Re- 
turn 1 if and only if V(VK, ø, (Ycs||Ọk: ||- IVks||L)) = 1, all proofs verify 
and if wx,,...,WxK, are all valid tag-based encryptions w.r.t. the tag VK. 


DEC(sk, w, L): parse the ciphertext Y as VK||ycs|lYk ||- {/¢x,||o. Return L if 
V(VK, o, (YesllYk I| Yks ||L)) = 0. Otherwise, use sk to decrypt (¢cs, L). 


OPEN (skoa, Y, L): parse the ciphertext w as VK||ycs|lYk I|- Yks l|o. Return 
L if Yki, .-.-, Uks are not all valid TBE ciphertexts w.r.t. the tag VK or if 


V(VK, o, (YesllYk I| Yks ||L)) = 0. Otherwise, decrypt Yk,- --, Yke using 
skoa and return the resulting pk = (X1,..., X6). 


From an efficiency standpoint, the length of ciphertexts is about 1.25 kB in an 
implementation using symmetric pairings with a 256-bit group order, which is 
more compact than in the Paillier-based scheme of where ciphertexts take 2.5 
kB using 1024-bit moduli. Moreover, our proofs only require 16.125 kB, which 
is significantly cheaper than in the original GE scheme [29], where interactive 
proofs reach a communication cost of 70 kB to achieve a 275° knowledge error. 
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A Sketch of the Proof of Theorem [I] 


The security proof of the certification scheme considers three kinds of forgeries 
in the attack game. 


- Type I forgeries: are such that the fake certificate certs contains a tuple of 
elements (Sf, 92, 93) that never appeared in outputs of certification queries. 

- Type II forgeries: are such that cert},. contains a triple (9%, 53, 93) that 
appeared in the output of some query but certs also contains commitments 
{(CF,, Cha; Za) }i=t,...9 that do not match those in the output of that query. 

- Type III forgeries: are such that (S{, 93,93) and {(C¥,, C#2,Cz3)}i=1,...yn 
are identical in certs and in the output of some certification query. On 
the other hand, the public key pk* = (X}f,...,X%) is not the one that was 
certified in that query. 


a) 


Type I forgeries are easily seen to break the HSDH assumption whereas Type 
II and Type III forgeries give rise to algorithms solving the FlexDH and S2P 
problems, respectively. Due to space limitations, the details are deferred to the 
full version of the paper. 
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Abstract. Predicate encryption is a recent generalization of identity- 
based encryption (IBE), broadcast encryption, attribute-based encryp- 
tion, and more. A natural question is whether there exist black-box 
constructions of predicate encryption based on generic building blocks, 
e.g., trapdoor permutations. Boneh et al. (FOCS 2008) recently gave a 
negative answer for the specific case of IBE. 

We show both negative and positive results. First, we identify a com- 
binatorial property on the sets of predicates/attributes and show that, 
for any sets having this property, no black-box construction of predicate 
encryption from trapdoor permutations (or even CCA-secure encryption) 
is possible. Our framework implies the result of Boneh et al. as a special 
case, and also rules out, e.g., black-box constructions of forward-secure 
encryption and broadcast encryption (with many excluded users). On 
the positive side, we identify conditions under which predicate encryp- 
tion schemes can be constructed based on any CPA-secure (standard) 
encryption scheme. 


1 Introduction 


In a predicate encryption scheme an authority generates a master public 
key and a master secret key, and uses the master secret key to derive personal 
secret keys for individual users. A personal secret key corresponds to a pred- 
icate in some class F, and ciphertexts are associated (by the sender) with an 
attribute in some set A; a ciphertext associated with the attribute J € A can be 
decrypted by a secret key Sf corresponding to the predicate f € F if and only 
if f(I) = 1. The basic security guarantee provided by such schemes is that a 
ciphertext associated with an attribute J hides all information about the under- 
lying message unless one has a personal secret key giving the explicit ability to 
decrypt; in other words, if an adversary A holds keys SK f ,..., SK f, for which 
fal) =- = fe(1) = 0, then A should learn nothing about the message. (A 
formal definition is given later.) 

By choosing F and A appropriately, predicate encryption yields as special 
cases many notions that are interesting in their own right. For example, by taking 


* Work done while visiting IBM. Research supported by DARPA, and by the US Army 
Research Laboratory and the UK Ministry of Defence under agreement number 
W911NF-06-3-0001. 
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A = {0,1} and letting F = {frp}rpe{o,1}» be the class of point functions 
(so that frp(ID’) = 1 iff ID = ID’) we recover the notion of identity-based 
encryption (IBE) [9H]. Similarly, it can be observed that predicate encryption 
encompasses fuzzy IBE [I8], forward-secure (public-key) encryption [7], (public- 
key) broadcast encryption P], attribute-based encryption [I5], and more as 
special cases. 

Most (though not all) existing constructions of predicate encryption schemes 
rely on bilinear maps. A natural question is: what are the minimal assumptions 
on which predicate encryption can be based? Of course, the answer will depend 
on the specific predicate class F and attribute set A of interest; in particular, 
Boneh and Waters [6] show that if F is polynomial size then (for any A) one can 
construct a predicate encryption scheme for (F, A) from any (standard) public- 
key encryption scheme. On the other hand, Boneh et al. | have recently shown 
that there is no black-box construction of IBE from trapdoor permutations. 


1.1 Our Results 


The specific question we consider is: for which (F, A) can we construct a predicate 
encryption scheme over (F, A) based on CPA-secure encryption? We show both 
negative and positive results. Before describing these results in more detail, we 
provide some background intuition. 

A natural combinatorial construction of a predicate encryption scheme over 
some (F, A) from a CPA-secure encryption scheme (Gen, Enc, Dec) is as follows: 
The authority includes several public keys pky,...,pkq in the master public 
key, and each personal secret key is some subset of the corresponding secret 


keys sk,,...,5k,. Encryption of a message m with respect to an attribute J re- 
quires “sharing” m in some way to yield mı, ..., Mq, and the resulting ciphertext 
is EnCpk, (M1), - . . , ENCpk, (mq). Intuitively, this works if: 


Correctness: Let Sky = {ski,..., ski} be a personal secret key for which 
f(T) =1. Then the “shares” mj;,,...,7™,;, should enable recovery of m. 
Security: Let {ski,,...,ski,} = Usex.pyzo 9K- Then the set of “shares” 


Miz, ---, Mi, Should leak no information about mil 


Roughly, our negative result can be interpreted as showing that this is essentially 
the only way to construct predicate encryption (in a black-box way) from CPA- 
secure encryption; our positive result shows how to implement the above for a 
specific class of predicate encryption schemes. We now provide further details. 


Impossibility results. Our negative results are in the same model used by 
Boneh et al. [5], which builds on the model used in the seminal work of Impagli- 
azzo and Rudich [J]. Specifically, as in [B] our negative results hold relative to 
a random oracle (with trapdoor) and so rule out black-box constructions from 
trapdoor permutations as well as from any (standard) CCA-secure public-key 
encryption scheme. 


1 This is stronger than what is required, but makes sense in a black-box setting where 
computational hardness comes only from the underlying CPA-secure scheme. 
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A slightly informal statement of our result follows. Fix {(Fn,An)}nen, a se- 
quence of predicate classes and attribute sets indexed by the security parame- 
ter n. We say that {(Fn, An)},, can be q-covered if for every set system {Sy} fef, 


with Ss C [q(n)] (lal qf {1,...,q}), there are polynomially-many predicates 
JF, fi,- --, fp E Fn such that, with high probability: 


1. Sps C Ura Sfi 
2. There exists an J € A, with fi) =--- = fp(Z) = 0 but f*(I)=1. 


{(Fn,An)},, is easily covered if it is q-covered for every polynomial q. We show: 


Theorem. If {(Fn,An)},, is easily covered, there is no black-box construction 
of a predicate encryption scheme over {(Fn,An)},, based on trapdoor permuta- 
tions (or CCA-secure encryption). 


Intuitively, if {(Fn, An)},, is easily covered then the combinatorial approach dis- 
cussed earlier cannot work: letting q(n) be the (necessarily) polynomial number 
of keys for the underlying (standard) encryption scheme, no matter how the se- 
cret keys {sk;}7_, are apportioned to the personal secret keys {SK} per, an 
adversary can carry out the following attack (cf. Definition P] below): 


1. Request the keys SK j, ..., SKf,, where each SK;,={sk1,...,} C {ski} 

2. Request the challenge ciphertext C to be encrypted using an attribute I for 
which f)(1) =--- = fp(J) =0 but f*(Z) =1. 

3. Compute the key Sky+ C U; SK, and use this key to decrypt C. 


This constitutes a valid attack since SK p+ suffices to decrypt C yet the adversary 
only requested SK y,,..., SK fp, none of which suffices on its own to decrypt C. 

Turning this intuition into a formal proof must, in particular, implicitly show 
that the combinatorial approach sketched earlier is essentially the only black-box 
approach to building predicate encryption schemes from trapdoor permutations. 
Moreover, we actually prove a stronger quantitative version of the above theorem 
showing, roughly, that if {(F,,An)},, is g-covered then any predicate encryption 
scheme over {(F,,,A,)},, must use at least q + 1 underlying encryption keys. 

One might wonder whether the “easily covered” condition is useful for de- 
termining whether there exist black-box constructions of predicate encryption 
schemes over {(Fn,An)},, of interest. We show that it is, in that the following 
corollary can be proven fairly easily given the above: 


Corollary. There are no black-box constructions of (1) identity-based encryp- 
tion, (2) forward-secure encryption (for a super-polynomial number of time pe- 
riods), or (8) broadcast encryption (where a super-polynomial number of users 
can be excluded) from trapdoor permutations. 


The first result was shown in [B]; the point is that our impossibility result strictly 
generalizes theirs. Moreover, as indicated earlier, we prove a quantitative version 
of their result (as well as all other results stated in the above corollary). 


Positive result. On the positive side, we show that the combinatorial approach 
suggested at the outset can be implemented for {(F,,,A,)},, having the following 
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property: for each I € An there are at most polynomially-many f € Fn for which 
f(T) = 0; i.e., for each I there are at most polynomially-many predicates that 
are “excluded”. (The positive result from [6], where there are only polynomially- 
many predicates, is thus obtained as a corollary.) This is proved by analogy to 
broadcast encryption, using the combinatorial techniques from [IJ]. 


1.2 Comparison to the Results of Boneh et al. 


Our proof relies heavily on the impossibility result from [5]. Our contribution 
lies in finding the right combinatorial generalization (specifically, the “easily 
covered” property described earlier) of the specific property used by Boneh et al. 
for the particular case of IBE, adapting their proof to our setting, and applying 
their ideas to the more general case of predicate encryption. Our generalization, 
in turn, allows us to show impossibility for several cryptosystems of interest 
besides IBE (cf. the corollary stated earlier), as well as to give quantitative 
versions of their earlier result. Our positive results have no analogue in [5]. 


2 Definitions 


2.1 Predicate Encryption 


We provide a functional definition of predicate encryption, followed by a weak 
definition of security that we use when proving impossibility and the standard 
definition of security that we use when proving our positive result. 


Definition 1. Fiz {(Fn,An)} nen, where Fn is a set of (efficiently computable) 
predicates over the set of attributes An. A predicate encryption scheme over 
{Fn,An}nen consists of four PPT algorithms (Setup, KeyGen, Enc, Dec) such that: 


— Setup is a deterministic algorithm that takes as input a master secret key 
MSK € {0,1}" and outputs a master public key MPK. 

— KeyGen is a deterministic algorithm that takes as input the master secret key 
MSK and a predicate f € Fn and outputs a secret key SK s=KeyGen yg x (f). 
(The assumption that KeyGen is deterministic is without loss of generality, 
since MSK may include a key for a pseudorandom function.) 

— Enc takes as input the public key MPK, an attribute I € An, and a bit b. It 
outputs a ciphertext C — Encyrpx (J, b). 

— Dec takes as input a secret key SK and ciphertext C. It outputs either a 
bit b or the distinguished symbol L. 


It is required that for all n, all MSK € {0,1}" and MPK = Setup( MSK), 
all f € Fy and Sky = KeyGen ysg(f), all I € An, and all b € {0,1}, that if 
f(D = 1 then Decsx, (Encypx (J, b)) =b. 


Definition 2. A predicate encryption scheme over (F, A) is weakly payload hid- 
ing if the advantage of any PPT adversary A in the following game is negligible: 
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1. A(1") outputs I* € An and (fi,..-,fp) € Fn such that fi(I*) = 0 for all i. 

2. Choose MSK <— {0,1}"; let MPK := Setup(MSK) and set SKy, := 
KeyGen(M SK, fi) for all i. Choose b — {0,1}, and compute the ciphertext 
C* — Encypx(I*,b). Then A is given (MPK,SKy,,...,5Kyz,,C*). 

3. A outputs b! and succeeds if b' = b. 


The advantage of A is defined as |Pr[A succeeds] — 4|. 


Definition 3. A predicate encryption scheme over (F,A) is payload hiding if 
the advantage of any PPT adversary A in the following game is negligible: 


1. A random MSK € {0,1}" is chosen, and A is given MPK :=Setup(MSK). 

2. A adaptively requests keys SKy,,... corresponding to predicates f,,...€ Fn. 

3. At some point, A outputs I* € An. A random b € {0,1} is chosen and A is 
given the ciphertext C* — Encmpr(I*,b). A may continue to request keys 
for predicates of its choice. 

4. A outputs b' and succeeds if (1) A never requested a key for a predicate f 
with f(I*) =1, and (2) bV = b. 


The advantage of A is defined as |Pr[A succeeds] — 4|. 


Our construction of Section B]can be modified to achieve the even stronger notion 
of attribute hiding; we refer to for a definition. 


2.2 A Random Trapdoor Permutation Oracle 


We assume the reader is familiar with the usual model in which black-box impos- 
sibility results are proved; see for further details. We show an oracle O 
relative to which trapdoor permutations and CCA-secure encryption exist, yet 
any construction of a predicate encryption scheme (for certain (F, A)) relative 
to O is insecure against a polynomial-time adversary given access to O and a 
PSPACE oracle. Our oracle O = (g,e,d) is defined as follows, for each n € N: 


— gis chosen uniformly from the space of permutations on {0,1}". We view g 
as taking a secret key sk as input, and returning a public key pk. 

— e : {0,1}" x {0,1}" — {0,1}" maps a public key pk and a “message” 
m € {0,1}" to a “ciphertext” c € {0,1}". It is chosen uniformly subject 
to the constraint that e(pk,-) is a permutation on {0,1}” for every pk. 

— d : {0,1}" x {0,1}" — {0,1}" maps a secret key sk and a ciphertext c 
to a message m. We require that d(sk,c) outputs the unique m for which 
e(g(sk),m) = c. 


With overwhelming probability O is a trapdoor permutation [05]. Moreover, 
since the components of O are chosen at random subject to the above con- 
straints (and not with some “defect” as in, e.g., [[0]), O implies CCA-secure 
encryption [I]. 

We denote a query a to O as, e.g., a dt [g(sk) = pk] and similarly for e and 
d queries. In describing our attack in the next section, we often use a partial 
oracle O’ that is defined only on some subset of the possible inputs. We always 
enforce that such oracles be consistent: 
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Definition 4. A partial oracle O' = (g',e',d') is consistent if: 


1. For every pk € {0,1}", the (partial) function e'(pk,-) is one-to-one. 

2. For every sk € {0,1}”, the (partial) function d'(sk,-) is one-to-one. 

3. For all x € {0,1}", and all sk such that g'(sk) = pk is defined, the value 
e'(pk, x) = c is defined if and only if d'(sk,c) = x is defined. 


3 An Impossibility Result for Predicate Encryption 


We define a combinatorial property on (Fn, An) and formally state our impossi- 
bility result. We describe in Section BJan adversary A attacking any black-box 
construction of a predicate encryption scheme satisfying the conditions of our 


theorem; an analysis of A is given in Appendix [AJand the full version. 


Fix a set F and a positive integer q, and let [q] = {1,...,q}. An F-set system 


over |q] is a collection of sets {Sf }fer where each f € F is associated with a 
set Sp C [q]. 


Definition 5. Let {(Fn,An)}nen be a sequence of predicates and attributes. We 
say {(Fn,An)}nen can be q-covered if there exist PPT algorithms (Aj, A2, A3), 
where Ao(1", f) is deterministic and outputs I € An with f(I) = 1, such that 
for n sufficiently large: 


For any F,,-set system {Sf} fer, over [q(n)], if we compute 
F — A"); T= Ao", P fas fp — As”, f), 


then with probability at least 4/5, 
2. fi(I*) =0 for alli. 


{(Fn,An)}nen is easily covered if it can be q-covered for every polynomial q. 


Although the above definition may seem rather complex and hard to use, we 
show in Section ]that it can be applied quite easily to several interesting classes 
of predicate encryption schemes. Moreover, the definition is natural given the 
attack we will describe in the following section. 

A black-box construction of predicate encryption is q-bounded if each of its 
algorithms makes at most q queries to ©. We now state our main result: 


Theorem 1. If {(Fn,An)} can be q-covered, then there is no q-bounded black- 
box construction of a weakly payload-hiding predicate encryption scheme over 
{(Fn;An)} from trapdoor permutations (or CCA-secure encryption). 


Since each algorithm defining the predicate encryption scheme can make at most 
polynomially-many queries to its oracle, we have 


Corollary 1. If {(Fn,An)} is easily covered, there is no black-box construction 
of a weakly payload-hiding predicate encryption scheme over {(Fn,An)} from 
trapdoor permutations (or CCA-secure encryption). 
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3.1 The Attack 


Fix an {(Fn, An)} that can be g-covered, and let PE = (Setup, KeyGen, Enc, Dec) 
be a predicate encryption scheme over {(Fn, An)} each of whose algorithms 
makes at most q = poly(n) queries to O = (g,e,d). We assume, without loss of 
generality, that before any algorithm of PE makes a query of the form [d(sk, x)], 
it first makes the query [g(sk)]. 

We begin the proof of Theorem[]by describing an adversary A attacking PE. 
Adversary A is given access to O and makes a polynomial number of calls to this 
oracle; as described, A is not efficient but it runs in polynomial time given access 
to a PSPACE-complete oracle (or if P = NP) and this suffices to prove black- 
box impossibility as in previous work [29075]. Our description of the attack is 
directly motivated by the attacker described in [5]. 

Let A1, A2, and A3 be as guaranteed by Definition D] and let p = poly(n) 
bound the number of predicates output by A3. Throughout A’s execution, when 
it makes a query to O it stores the query and the response in a list L. We also 
require that before A makes any query of the form [d(sk,x)], it first makes the 
query [g(sk)]. Furthermore, once the query [g(sk) = pk] has been made then 
[e(pk, x) = y] is added to L if and only if [d(sk, y) = a] is added to L. 


Setup and challenge. A(1”) computes f* — A,(1"), I* := Ag(1", f*), and 
(f1,---, fp) — A3(1”, f*). Then: 
1. If f;(J*) = 0 for all i, then A outputs (I*, fi,..., fp) and receives the values 


(MPK,SKy,,...,SKyz,,C*) from the challenger (cf. Definition 2). 
2. Otherwise, A aborts and outputs a random bit b’ — {0,1}. 


Step 1: Discovering important public keys. For i = 1 to p, adversary A 
does the following: 


1. Compute Iş, = A2(1”, fi), and choose random b — {0,1} and r — {0,1}”. 
2. Compute Dec$x,, (Encor (Zp. r)), storing all O-queries in the list L. 


Step 2: Discovering frequent queries for I*. A repeats the following q- p? 
times: Choose random b — {0,1} and r — {0,1}"; compute Enc; px (I*, 6; 1), 
storing all O-queries in L. 


Step 3: Discovering secret queries and decrypting the challenge. A 
chooses k — [q- p?] and runs the following k times. 


1. A uniformly generates a secret key MSK’ and a consistent partial ora- 
cle O' for which (1) Setup? (MSK’) = MPK; (2) for all i it holds that 
KeyGenorsx(fi) = Si;,; (3) the oracle O’ is consistent with L; and (4) the 
key SK}. 2 KeyGen; sye (f*) is well-defined. 

We denote by L’ the set of queries in O’ that are not in L (the “invented 


queries” ). Note that |L’| < q (p+2), since at most q queries are made by Setup 
and KeyGen( f) makes at most q queries for each of Sk s+,SKy,,...,SKy,. 
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2. A chooses b — {0,1} and r — {0,1}”, and computes C := Enc px (I*, b; r) 
(storing all O-queries in L). For an oracle O” defined below, A then does: 
(a) In iteration k’ < k, adversary A computes Dees", (C). 


(b) In iteration k, adversary A computes b = DecSe. (C*). 


Output: A Outputs the bit b computed in the k*t} iteration of step 3. 


Before defining the oracle O” used above, we introduce some notation. Let L, 
O’, and MSK’ be as above, and note that we can view L and ©’ as a tuple of 
(partial) functions (g,e,d) and (g’,e’,d’) where g’,e’, and d’ extend g,e, and d, 
respectively. Define the following: 


— Q% is the set of pk for which [g’(sk) = pk] is queried during computation of 
Setup? (MSK’). 

— O;, is the set of pk for which [g' (sk) = pk] is queried during computation of 
KeyGen, sg (f) for some f € {f*, fi,-.-; fp} 

— Qk-s = Qr \ Qs- 

— L; is the set of pk for which the query [g(sk) = pk] is in L. 


Note that A can compute each of these sets from its view. Note further that 
Qs, Qk, We g,O" are fixed throughout an iteration of step 3, but L, may 
change as queries are answered. 

Oracle O” is defined as follows. For any query whose answer is defined by O’, 
return that answer. Otherwise: 


1. For an encryption query e(pk, x) with pk E€ Q_s \ Lg, return a random 
y consistent with the rest of ©”. Act analogously for a decryption query 
d(sk,y) with pk E€ QO; \ Lg (where pk = g(sk)). 

2. For a decryption query d(sk,y), if there exists a pk with [g(sk) = pk] € O’ 
but] there exists an sk’ # sk with [g(sk’) = pk] € L, then use O” to answer 
the query d(sk’, y). 

3. In any other case, query the real oracle O and return the result. Store the 
query/answer in L (note that this might affect Lg as well). 


An analysis of A, proving Theorem[]] appears in Appendix[AJand the full version 
of our paper. The analysis is very similar to the one given in [B], with the main 
difference being Proposition [] 


4 Impossibility for Specific Cases 


We use Theorem [to rule out black-box constructions of predicate encryption 
schemes in several specific cases of interest. Specifically, we consider the cases of 
identity-based encryption, forward-secure encryption, and broadcast encryption. 
We begin with a useful lemma. 


? Although O’ is chosen to be consistent, a conflict can occur since L is updated as A 
makes additional queries to the real oracle O. 
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Lemma 1. Fix q(-), and assume {(Fn,An)}nen has the following property: For 
sufficiently large n, there exist fi,..., fsq E Fn and I,...,1I5q E An such that: 


For alli € {1,...,5q} it holds that f;(I;) =1 but fii) =0 for j >i. 


Then {(Fn,;An)}nen can be q-covered. If the above holds for every polynomial q, 
then {(Fn, An) }nen is easily covered. 


Proof. We show that, under the stated assumption, { (Fn, An)}nen satisfies Def- 
inition þb} Fix q and n large enough so that the condition of the lemma holds, 
and let f1,..., fsq and ,...,I5_ be as stated. Define algorithms A, A2, A3 as 
follows: 

1. Aı(1”) chooses i — {0,...,5q} and outputs f* = fi. 

2. Ao(1", f*) finds i for which f* = fi and outputs I* = J;. 

3. A3(1”, f*) finds i for which f* = fi and outputs fiz1,..., fsa (If i = 5q 

then output nothing.) 


Note that Aj(1”, f*) always outputs I* with f*(J*) = 1. We show that for any 


Fy-set system {Sy} per, over [q], the conditions of Definition BJ hold. We begin 
with the following claim: 


Claim. For any F,-set system {Sf}fer, over |q], there are at most q values 
i € {1,...,5q} for which Sp, É Usej<sq Sr: (By convention, the union is the 
empty set if 7 = 5q.) 

Proof. Define Si = U;<;<5q Sf, With Ssq = 0. Note that Si—1 = S: U Sy, and 
sO Sf, g Uiej<sq Sf = S; iff S; Ç S;—1. Since 


Ssq C S5q-1 C --- C S1 C [q], 


there can be at most q indices i where this occurs. 


Fixing an arbitrary Fn-set system {Sf}fer, over [q], let I C {1,...,5q} be the 
set of indices for which Sy, C U Sf; the claim above shows that |I| > 4q. 
If A, chooses 7 € I then: 


CU, 


<j<q Sf 
2. fi(I*)= fj) =0 for all the predicates fi+1,..., fq output by As. 


i<j<q 


Since A, chooses i € I with probability 4/5, this proves the lemma. 
We now apply Lemma] to several specific cases. 


Identity-based encryption. It is easy to see that IBE for identities {Zn} 
can be viewed as an instance of predicate encryption by setting An = Zn and 
Fn ={fro}rpez, where 


nat {1 if LD! =ID 
fp) = i otherwise 


Let N = |Z,,| denote the size of the identity space. Boneh et al. already 
rule out black-box constructions of IBE from trapdoor permutations for N = 
w(poly(n)); the next theorem shows that our Theorem [] generalizes their result: 
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Theorem 2. There is no black-box construction (from trapdoor permutations 
or CCA-secure encryption) of an IBE scheme for 5N identities where each al- 
gorithm makes fewer than N queries to its oracle. 

As a corollary, there is no black-box construction of an IBE scheme (from 
trapdoor permutations or CC'A-secure encryption) for a super-polynomial number 
of identities. 


Proof. Let ZT, = {ID,,...,[Dsn}. It is not hard to see that {(Fn,An)}nen 
can be N-covered: take frp,,..., frpsxj and set J; = ID; for all i. Then apply 
Theorem [i] 


Forward-secure public-key encryption. In a forward-secure public-key en- 
cryption scheme [J secret keys are associated with time periods; the secret key 
at time period i enables decryption for ciphertexts encrypted at any time j > i. 
(We refer the reader to [ for further discussion.) A forward-secure encryption 
scheme supporting N = N(n) time periods can be cast as a predicate encryption 
scheme by letting A, = {1,...,N} and Fn = {fi}i<icn where 


def fl if 7 >% 
Al) = k otherwise 


(A forward-secure encryption scheme imposes the additional requirement that 
SKfiņı can be derived from SKp,; since we do not impose this requirement 
our impossibility result is even stronger.) A black-box construction of a forward- 
secure encryption scheme from any CPA-secure encryption scheme exists for any 
N = poly(n): the master public key contains public keys {pk1, . . . , pkn }, and the 
secret key at period i is SK 7, = {sk;,...,skn}; encryption at period j uses pkj. 
While such a scheme is trivial as far as forward-secure encryption goes (since 
the public/secret key lengths are linear in N), it satisfies the definition. The 
next theorem indicates that, in some sense, this trivial construction is almost 
optimal as far as black-box constructions are concerned; moreover, there is no 
black-box construction supporting a super-polynomial number of time periods. 
(In contrast, there exist schemes based on specific assumptions that support 
an unbounded number of time periods.) 


Theorem 3. There is no black-box construction (from trapdoor permutations or 
CCA-secure encryption) of a forward-secure encryption scheme for 5N periods 
where each algorithm in the scheme makes fewer than N queries to its oracle. 

As a corollary, there is no black-box construction of a forward-secure encryp- 
tion scheme (from trapdoor permutations or CCA-secure encryption) supporting 
a super-polynomial number of time periods. 


Proof. {(Fn,An)}nen can be N-covered, as taking fi,..., fsx and setting J; = i 
for all 7 satisfies the conditions of Lemma[]] Then apply Theorem [H 


Broadcast encryption. Finally, we look at the case of (public-key) broadcast 
encryption [9]. Here, there is a fixed public key and a set of users U = {1,...,U} 
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each with their own personal secret key; it should be possible for a sender to 
encrypt a message in such a way that only some subset U’ C U of users can 
decrypt. Consider the case where at most k = k(n) < U users are excluded; 
we refer to this as k-exclusion broadcast encryption. This can also be modeled 
by predicate encryption, if we let A, = {U' CU | |U'| > U — k} and define 
Fn ={fiticu where 
nae fl ifieu 
fu) = to otherwise 

Theorem 4. There is no black-box construction (from trapdoor permutations or 
CCA-secure encryption) of a (5k)-exclusion broadcast encryption scheme where 
each algorithm in the scheme makes k or fewer queries to its oracle. 

As a corollary, there is no black-box construction of a k-exclusion broadcast 
encryption scheme (from trapdoor permutations or CCA-secure encryption) for 
super-polynomial k. 


Proof. We show that {(Fn,An)}nen can be k-covered. Take fi,..., fsk and de- 


fine 


I, 2 U\ {a,..., 5k} 


for i € {1,...,5k}. (So Is, = U.) Note that |I| > U — 5k always, and these 
satisfy the conditions of Lemma[]] Applying Theorem [concludes the proof. 


5 A Possibility Result for Predicate Encryption 


Here we show that for the class of predicates and attributes {(Fn, An)} where 
(roughly) for each I € An there are at most polynomially-many f € Fna with 
J(I) = 0, there is a black-box construction of a predicate encryption scheme 
over {(Fn,An)} based on any CPA-secure encryption scheme. We remark that 
while we only prove payload hiding, our construction can in fact be shown to be 
attribute hiding [3] as well. 

Our construction relies on the notion of an (N, k)-cover free family [8]: 


Definition 6. An (N, k)-cover free family over [U] is a family S = {51,..., SN}, 
with Si C [U], such that for any distinct sets S, S1,..., Sk E S it holds that 


S\ Ui S: AO. 


For any k = poly(n) and N = 2°°%™) there exist explicit, polynomial- 
time constructions of an (N,k)-cover free family over [U] with |U| = poly(n). 
(The specific results of can be used to improve the efficiency of the con- 
struction that follows, but our only goal here is to show a construction that can 
be implemented in polynomial time.) 


Theorem 5. Fiz {(Fn,An)} and set Neg, 5 {f E€ Fn: F(I) = 0} for I € An. If 
there is a poly-time algorithm ListNeg for which ListNeg(1”, I) = Neg,, then there 
is a black-box construction of a predicate encryption scheme over {(Fn, An)} 
from any C'PA-secure encryption scheme. 
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Proof. Since ListNeg runs in polynomial time, there is a polynomial k for which 
|Neg;| < k(n) for all I € An. Say predicates in F, can be represented using 
£(n) = poly(n) bits. Let {Un} be such that U, = poly(n) and such that, for 
each n, there is an explicit (2), k(n))-cover free family S = {$},..., Sgen)} 
over [Un]. Identifying Fa with a subset of [2], we can view the cover-free 
family as S = {SF} fef,- 

Let (Gen’, Enc’, Dec’) be a CPA-secure encryption scheme. Our construction 
of a predicate encryption scheme over {(Fn, An)} is as follows: 


— Setup, on input 1” and a sufficiently long random string M SK, runs Gen’ (1") 
a total of U = U,, times to generate keys (pki, sk1),...,(pku, sky). The 
master public key is {pki,...,pky}. 

— KeyGen, given the secret keys {sk;}“, and a predicate f € Fn, outputs the 
subset {ski}ics,- 

— Enc, given the public key, an attribute J € An, and a message m, computes 
Neg, = ListNeg(I) and sets U = [U] \ (pees: Sr). The ciphertext is 
(I, {Ci}iez) where C; — Enc,,,(m). 

— Dec, given the secret key {ski}ies, for a predicate f and a ciphertext 
(1,{Ci}ie@) for which f(Z) = 1, first finds an index i for which i € Sp NU. 
(Such an index must exist, since 


S;\U = Ss \ Uppoo S> 


and there are at most k predicates f’ that the union is taken over.) The 
output is Dec), (Cj). 


It is easy to see that the above construction satisfies correctness. We now prove 
security (in the sense of Definition B). Let A be an adversary attacking the 
scheme. We may assume without loss of generality that A never requests a 
secret key for a predicate f for which f(I*) = 1 (where I* is the attribute used 
to encrypt the challenge ciphertext), since A cannot succeed if that occurs. 

For simplicity we prove security in a non-uniform model, but the proof can be 
modified easily to hold in the uniform model in the standard way. We consider 
U-+1 hybrid experiments Ho,...,Hy+41, where Ho corresponds to the experiment 
of DefinitionB]when b = 0 is encrypted, and Hy +1 corresponds to the experiment 
of Definition B] when b = 1 is encrypted. Let 6; denote the probability that A 
outputs ‘0’ in H;. We show that |ô; — 6;41| is negligible for all i; since U = Un 
is polynomial in n, this proves that |ĝo — dy+1| is negligible and thus completes 
the proof. 

Experiment H; is defined as follows: Steps 1 and 2 are exactly as in Defi- 
nition B] In step 3, however, when encrypting the challenge ciphertext for the 
attribute I*, let U* = [U]\Neg;. and set the ciphertext equal to (I, {Cj} jeu), 


where 
C, Enc,,,,, (1) j<i 
3 Enc k (0) j >i 


A may continue to request secret keys as in Definition B] 
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We now prove that |ô; — 6;41| is negligible for any j. Fix j and consider the 
following adversary A’ attacking the underlying encryption scheme (Gen’, Enc’, 
Dec’). Given public key pk and ciphertext C (which is either an encryption of 0 
or 1), the adversary A’ proceeds as follows: 


1. Set pk; = pk. For i # j, compute (pki, ski) — Gen’(1"). Give the master 
public key {pky,...,pku} to A. 

2. When A requests a secret key for a predicate f, then if j ¢ Sp give to A the 
secret keys {skj}ies,. Otherwise, abort and output a random bit. 

3. When A outputs J*, compute Negr. = ListNeg(I*) and then set 


=U U sy 


fENegr» 


If j g U* then abort and output a random bit. Otherwise, give A the ci- 
phertext (I, {C:}ieg») where 


Enc), (l)i < j 
4. Subsequent secret key queries made by A are answered as before. Finally, 
A’ outputs whatever bit is output by A. 


Let Pr,[-] denote the probability of an event in experiment Hj. We have 
|Pr[A’ outputs 0 | C — Enci ,(0)] — Pr[A’ outputs 0 | C — Enc), (1)]| 
= |Pr [j € U*] - Pr; [A outputs 0 | j € U*] 
— Pr |j € U*] - Prj41 [A outputs 0 | j € U*] 


2 


using the facts that (1) Pr[j € U*] is independent of whether C is an encryption 
of 0 or 1 and (2) when C is an encryption of 0 (resp., 1) then the view of A 
(assuming j € U*) is identical to its view in H; (resp., H;+1). Note further that 


Pr;[A outputs 0 | j g U*] = Prj41[A outputs 0 | j ¢ U*] 


since the challenge ciphertext is distributed identically in each case. It follows 
that 
|Pr[A’ outputs 0 | C — Enc,,,(0)] — Pr[A’ outputs 0 | C — Enc,,.(1)]| 
= |Pr [j € U*] - Pr; [A outputs 0 | j € U*] 
— Pr |j € U*] - Prj41 [A outputs 0| j € O*]| 


= |ð; — ðj+ıl, 


concluding the proof. 
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A Proof Details 


We analyze the success probability of the adversary A from Section BI] Due to 
space limitations, the proof cannot be reproduced here in its entirety; we have 
instead aimed to describe those parts of our proof that differ most prominently 
from the proof of Boneh et al. [5]. The most significant new element in our proof 
is Proposition] 

Toward analyzing the success probability of A, we describe a series of ex- 
periments, the first of which corresponds to adversary A interacting in the ex- 
periment from Definition 2] We show that, as long as no “bad” events (to be 
defined later) occur, the statistical distance between the transcripts generated in 
each of these experiments is not too large. This allows us to bound A’s success 
probability by comparing it to an appropriate event in the final experiment. 


Expty: This corresponds to A interacting in the experiment from Definition J] 


Expt,: This is the same as Expt, except that O” (as defined after the kt" repe- 
tition of step 3) is used instead of O to compute the challenge ciphertext C*. 


Expt»: This is the same as Expt, except that O” never queries O (cf. step 3 in 
the definition of O”); instead, any such queries are answered randomly (subject 
to ensuring that O” remains consistent). 


Expt3: This is the following experiment with no adversary and using the real 
oracle O: 


Setup and challenge 


1. Compute f* — Aı(1”), I* = Ao(1”, f*), and {fi,..., fp} — Ag(1”, f*). 

2. Choose at random MSK + {0,1}" and compute MPK := Setup? (MSK). 
If fi(I*) = 1 for some i, abort and output a random bit. 

3. For every predicate f € {f*, fi,..., fp} compute SK > := KeyGenoyox(f). 
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Step 1: Discovering important public keys. For i = 1 to p do: 


1. Compute I, — A2(1”, fi), and choose random b; — {0,1} and r; — {0,1}". 
2. Compute Dec$ g, (Enchypx (Ipi bi; ri)). 


Step 2: Decrypting the challenge 


1. Choose r — {0, 1}", b — {0,1} and compute C* := EncO, px (I*, b; r). 
2. Compute b := Decs (C*) and output b’. Note that b’ = b always. 


This completes the description of Expt. 


For i € {0,1,2} we will be interested in the following transcripts defined in the 
course of Expt;. These transcripts contain, in particular, all oracle queries /answers. 
— tra iS aaa The transcript of the setup phase. This includes the computation 

of MPK and SKy,,...,S5Kz,, as well as the computation of SK p» for the 

f* chosen by the adversary. (Even though Sy. is not computed in the 

experiment, SK p» is well defined given f*, MSK, and O.) 
= trans) j..: The transcript of step 1 (“discovering important public keys” ). 

— tranS reg: T he transcript of step 2 (“discovering frequent queries for I*”). 

— tranStim-setup: This is the transcript defined by the adversary’s choice of 
MSK’ and ©’ in the kt? repetition of step 3, and can be viewed as the 
adversary’s “guess” for ENS: pois 

— transi: The transcript of the encryption of C/decryption of C* in the k*t? 
repetition of step 3. 


t — t 
— trans’ = (transscrup; 


4 
sim-setup? 


i 


i 
trans)... trans transi). 


For Expt; we define 


= trans? ee The transcript of the “setup and challenge” step. 


— trans?,.: The transcript of step 1 (“discovering important public keys”). 
— trans’: The transcript of step 2 (“decrypting the challenge” ). 
— trans? = (trans®,,, trans?;-setup» trans). 

For a given transcript, we partition the set of public keys used (i.e., the set of 
pk’s for which [g(-) = pk] € trans) into the following sets: 


— We let Qs(trans) denote the public keys queried during execution of Setup: 
Qs(trans) z {pk | the query [g(-) = pk] € trans is asked by Setup}. 


Intuitively, these are the pk’s whose corresponding sk’s are “useful” for de- 
crypting ciphertexts. 

— We let QOx(trans) denote the public keys queried by the KeyGen algorithm 
when some personal secret key is derived: 


Ox(trans) eZ {pk | [g(-) = pk] € trans is asked by KeyGen ysg(-)} 


Qx-s(trans) = Ox (trans) \ 2s(trans). 


— Finally, we will also look at the public keys “discovered” during encryption 
and decryption (cf. step 3 of the experiments): 


Qenc+pecltrans, I, f) = {pk | [g(-) = pk] asked by Dees, (Enempx (T, -))} 
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A.1 Bounding Probabilities of Bad Events 


Fixing the master secret key MSK and the oracle O (this fixes MPK as well 
as {Sis} rer), we define four “bad” events and bound the probabilities of each 
of them. Here, we will only describe and bound one of these events; we refer to 
the full version of our paper for the remainder of the proof. 

Let Ei be the event that either of the following is true (in Expt,): 


1. 3f: € {f1,---, fp} such that f;(I*) = 1. 
2. The following condition holds: 


Qrnc+pec(trans), I f=) N Qs(trans) im setup) 
¢ U Qenc+pecltransi gs, Iş, f) N Qs(tranS$im-setup)» 


where Ip := A2(1”, f). 


Intuitively, the second condition above is the event that the public keys that are 
“useful” for f1,..., fp does not contain the public keys that are “useful” for f*. 

We bound the probability of E}, ç using the assumed easily-covered property 
of {(Fn, An)}; this is the crux of our proof, and is what motivates Definition B} 


Proposition 1. PrE} c] < 1/5. 


Proof. Fix O and MSK € {0,1}", thus fixing trans$;im-setup: If for each f € Fn 
we fix a random tape ry that is sufficiently long to run Decsg,(Encmpx (J, b; r)) 


(where I qf Ao(f)), then this defines, for each f, the set 

Sf 

= {p k | [g(-) = pk] asked by Decsx, (Encypx (J, b; r))} N Qs(tranSžim-setup)- 
Numbering the (at most q) public keys in Qs(trans3;m-setup) in lexicographic 


order, we can view these {Sy} yer, as an F,-set system over |q]. The fact that 
{(Fn, An)} can be g-covered implies that there exists a polynomial p such that 


Vf E Fn ire — {0,1}* p 3 
Pr | f* — Ay, I* := Aə(1”, f*) i (s+ = Ús) VAN (vi : ar) 2 0) > = 
fieco} = Al") i= 


The above is a lower bound on the probability that Eœ does not occur. 
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Abstract. This paper presents a hierarchical predicate encryption 
(HPE) scheme for inner-product predicates that is secure (selectively 
attribute-hiding) in the standard model under new assumptions. These 
assumptions are non-interactive and of fixed size in the number of adver- 
sary’s queries (i.e., not “g-type”), and are proven to hold in the generic 
model. To the best of our knowledge, this is the first HPE (or dele- 
gatable PE) scheme for inner-product predicates that is secure in the 
standard model. The underlying techniques of our result are based on a 
new approach on bilinear pairings, which is extended from bilinear pair- 
ing groups over linear spaces. They are quite different from the existing 
techniques and may be of independent interest. 


1 Introduction 


1.1 Background 


The notion of predicate encryption (PE) was explicitly presented by Katz, Sahai 
and Waters as a generalized (fine-grained) notion of encryption that covers 
identity-based encryption (IBE) PBBEPIOIS], hidden-vector encryption (HVE) 
[Z] and attribute-based encryption (ABE) ME3M9RORI]. 

Informally, secret keys in a predicate encryption scheme correspond to predi- 
cates in some class F, and a sender associates a ciphertext with an attribute in 
a set X; a ciphertext associated with the attribute J € X can be decrypted by 
secret key sky corresponding to the predicate f € F if and only if f(T) =1. 

In addition, a stronger security notion for PE, attribute-hiding, than basic 
security requirement, payload-hiding, was defined in [16]. Roughly speaking, 
attribute-hiding requires that a ciphertext conceal the associated attribute as 
well as the plaintext, while payload-hiding only requires that a ciphertext con- 
ceal the plaintext. If attributes are identities, i.e., PE is IBE, attribute hiding 
PE implies anonymous IBE. 

Katz, Sahai and Waters also presented a concrete construction of PE for 
a class of predicates called inner-product predicates, which represents a wide 
class of predicates that includes an equality test (for IBE and HVE), disjunc- 
tions or conjunctions of equality tests, and, more generally, arbitrary CNF or 
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DNF formulas (for ABE). Informally, an attribute of inner-product predicates 
is expressed as vector X and predicate f> is associated with vector W, where 
fo(@) =1iff x- wv =0. (Here, X -v denotes the standard inner-product.) 

Although the Katz-Sahai-Waters scheme is the most expressive attribute- 
hiding PE among the existing schemes, no delegation functionality was proposed. 
Shi and Waters [22] presented a delegation mechanism for a class of PE, but the 
admissible predicates of the system, which is a class of equality tests for HVE, 
are more limited than inner-product predicates in [I6]. Okamoto and Takashima 
presented hierarchical delegation of PE for inner-product predicates, but the 
security proof was only given in the generic model. 


1.2 Our Results 
This paper addresses the above problems in [LOI22)18). 


— This paper proposes a hierarchical predicate encryption (HPE) scheme for 
inner-product predicates, where a (natural) hierarchical delegation system 
of inner-product predicates is provided e.g., our hierarchical system is con- 
sistent with that for hierarchical IBE (HIBE) BAIA (i.e., our HPE is 
specialized to anonymous HIBE, if the predicate of HPE is specified to the 
equality test of identities). 

— The proposed HPE scheme is selectively attribute-hiding against chosen- 
plaintext-attacks (CPA) in the standard model under two new assumptions, 
the RDSP and IDSP assumptions. These assumptions are non-interactive, 
falsifiable and of fixed size in the number of adversary’s queries (i.e., not 
“q-type” ), and are proven to hold in the generic model. 

— To achieve the result, this paper advances an approach recently developed in 
CMS]. This approach is extended from bilinear pairing groups into higher 
dimensional vector spaces, and a notion, dual pairing vector spaces (DPVS), 
is employed in this paper. (We will explain this approach below.) 

One of the most basic decisional assumptions in this approach is the de- 
cisional subspace problem (DSP) assumption. (It is a higher-dimensional 
generalization of the decisional DH and Linear assumptions, and the rela- 
tionships of this assumption with the traditional ones are studied in [I7].) 

The assumptions introduced in this paper, the RDSP and IDSP assump- 
tions, are variants of the DSP assumption in DPVS. 

— The performance of the proposed HPE scheme is almost the same as (or 
slightly worse than) that in [[8], where the dimension of DPVS for our HPE 
scheme is n + 3, whereas that for is n+ 2, when n is the dimension of 
predicate/attribute vectors. 

— Since HPE is a generalized (fine-grained) version of anonymous HIBE 
(AHIBE) (or includes AHIBE as a special case), HPE covers (a generalized 
version of) applications described in B], fully private communication and 
search on encrypted data. For example, we can use a two-level HPE scheme 
where the first level corresponds to the predicate/attribute of (single-layer) 
PE and the second level corresponds to those of “attribute search by a pred- 
icate” (generalized “key-word search” ). 
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1.3 A New Approach — Dual Pairing Vector Spaces 


We now explain how the approach works by using a typical construction example 
on direct products of pairing groups (q, G1, G2, Gr, 91, 92, gr,e), where q is a 
prime, Gi, G2 and Gr are cyclic groups of order q, gi is a generator of G; 
(i = 1,2), e : Gi x G —> Gr is a non-degenerate bilinear pairing operation, 
and gr := e(g1, g2) # 1. Here we denote the group operation of G1, G2 and Gr 
by multiplication. Note that this construction also works on symmetric pairing 
groups, where G; = G2. As for the definitions of some notations, see Section L5] 


N N 
pe Ar 

Vector spaces V and V*: YV := G, x- x G and V* := G2 x- xX Go, 
whose elements are expressed by N-dimensional vectors, £ := = Ags seagi”) 
and y := (g3',...,93”), respectively (xi, y; € Fy for i = 1,..., N). 

Canonical bases A and A*: A := (a1,.. ee of V, where a; := (9111 pereg ls 
ao := (1,91,1,...,1),...,@w = (1,...,1,g1). A := (af,...,@%) of V*, 
where a} := (g2, 1,.. 1), a3 (1.995 Lys l] ye S (1,-4.1, 43). 

Pairing operation: cone = heer elg, 95 : = a g2) 1 Tivi = gr” € 
Gr for the above æ € V and y € V*. 

Base change: Canonical basis A is changed to basis B := (b1, .. -1by) of V using 
a uniformly chosen (regular) linear transformation, X := (Xij) = GL(N, 5 a); 
such that b; = Da Xi, jaj, (i = 1,..., N). A* is also changed to basis B* := 
(bi,..., by) of V*, such that (v; j) := . (X1)- bo = i vi jaj, (i = 
1,...,.N). We see that e(b;,bt) = gp’, (&j = Lifi = j, and 6,; = 0 if 


T D i.e., B and B* are dual orthonormal bases of V and Y*. 

Intractable Problem: One of the most natural decisional problems in our 
approach is the decisional subspace problem (DSP) [E]. The DSP n,n.) 
assumption is: it is hard to tell v := vy,41bn,.41 +--+ un, bn, from u := 
vibi + +--+ on, by,, where (v1,...,0N,) € FN and Np +1 < Ni. DSP is 
intractable if the generalized DDH or DLIN problem is intractable [I7]. 

Trapdoor: Although the DSP problem is assumed to be intractable, it can 
be efficiently solved by using trapdoor t* € span(bj,...,by,). Given v := 
UN, +10No41 + °° + UN, bn, Or u := vibi +--+ + vy by, we can tell v from 
u using t* since e(v,t*) = 1 and e(u,t*) Æ 1 with high probability. 


1.4 Related Works on Our Approach 


Higher dimensional vector treatment of bilinear pairing groups have been already 
employed in the literature especially in the areas of IBE, ABE and BE (e.g., 
ATION SIT). For example, in a typical vector treatment, two vector forms 
of P := (gi',..., gi") and Q := (g3’,...,g8") are set and pairing for P and Q 
is operated as e(P, Q) := [J [;—; e(g}', 93’). Such a treatment can be rephrased in 
our approach using the (symmetric pairing) notations shown in Section [3]such 
that P = 41a, +: + Znan and Q = ya} +- + yna}, over canonical basis A 
and A*. 
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The major drawback of this approach is the easily decomposable property over 
A (and A*). That is, it is easy to decompose xia; = (1,...,1,g97',1,...,1) from 
P := 4101 +++: Enan = (gt',---, 91"): 

In contrast, the current approach employs basis B that is linearly transformed 
from A using a secret random matrix X € F7'*". A remarkable property over B 
is that it seems hard to decompose xib; from P’ := x1b1 +--+ £nbn. In addition, 
the dual orthonormal basis B* of V* can be used as a source of the trapdoors to 
the decomposability (see Section [L3 through the pairing operation over B and 
B*. The hard decomposability and its trapdoors are the key trick in this paper. 
Note that composite order pairing groups are often employed with similar tricks, 
hard decomposability of a composite order group into the prime order subgroups 
and its trapdoors through factoring (e.g., [[G—22]). 


1.5 Notations 


When A is a random variable or distribution, y £ A denotes that y is randomly 


selected from A according to its distribution. When A is a set, y Æ A denotes 
that y is uniformly selected from A. y := z denotes that y is set, defined or 
substituted by z. When a is a fixed value, A(x) — a (e.g., A(x) — 1) denotes 
the event that machine (algorithm) A outputs a on input x. A function f : N—R 
is negligible in A, if for every constant c > 0, there exists an integer n such that 
fA) < A~* for all A >n. 

We denote the finite field of order q by Fg. A vector symbol denotes a vector 
representation over Fy, e.g., X denotes (x1,..., £n) € F}. X: V denotes the 
inner-product $i; ziv; of two vectors X = (£1,..., 8n) and W = (v1,...,Un). 
XT denotes the transpose of matrix X. A bold face letter denotes an element 
of vector space V (resp. V*), e.g, æ € V (resp.2* € V*). span(by,..., bn) 


(resp. span(21,..., @n)) denotes the subspace generated by bi,...,bn (resp. 
=> = 
T lesas En): 


2 Dual Pairing Vector Spaces 


Definition 1. “Dual pairing vector spaces (DPVS)” (q4, V, V*, Gr, A, A*) are a 
tuple of a prime q, two N-dimensional vector spaces Y and V* over F4, a cyclic 
group Gr of order q, and their canonical bases i.e., A := (a1,...,an) of V and 
A* := (aj,...,@y) of V* that satisfy the following conditions: 


1. [Non-degenerate bilinear pairing] There exists a polynomial-time computable 
nondegenerate bilinear pairing e : V x V* — Gr i.e., e(sx,ty) = e(x, y)‘ 
and if e(a,y) =1 for ally € V, then z =0. 

2. [Dual orthonormal bases] A, A*, and e satisfy e(a;,a;) = go for alli and 
j, where ĝi; =1 ifi = j, and 0 otherwise, and gr #1 € Gr. 

3. [Distortion maps] Endomorphisms ¢;,; of V s.t. ġi ;j(aj) = a; and Qi j(ak) = 
0 if k Æ j are polynomial-time computable. Moreover, endomorphisms Pij 
of V* s.t. b; (aj) = aj and ¢; ;(a%) = 0 if k 4 j are also polynomial-time 
computable. We call ĝi j and ¢;,, “distortion maps”. 
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Three typical constructions are given in [I7]; a product of bilinear pairing groups, 
or a Jacobian variety of a supersingular curve of genus > 1 B3]. See Section L3] 
as well (where the description of distortion maps is omitted). 


3 Assumptions 


This section defines two variants of the DSP assumption, the RDSP and IDSP 
assumptions. An intuition behind these assumptions are given in Remark below. 

DPVS generation algorithm Ggpys takes input 14 (A € N) and N EN, and 
outputs a description of param := (q, V, V*, Gr, A, A*) with security parameter 
and N-dimensional V and Y*. It can be constructed in a manner shown in £A. 
We describe a random orthonormal basis generator Gob below, which is used as 
a subroutine in the RDSP and IDSP instance generators. 


Gop(1*, N) : param := (q, V, V*, Gr, A, A*)  Gapvs(1, N), 
U = 
X= (Xij) = GL(N, F,), (diz) = (XT) : 
bi:= Y3 Xijaj, B:= (Bigs, bN), OF = get, BY (bf, , DX), 
return (param, B, B*) 


We now define the RDSP and IDSP instance generators, GRPSP and GIPSP, 


— 


= 
GRESPA n) : (param, B, B*) £ Goo(1*,n + 3), y= (y1, yig +3 Yn) = F; \ { 0 i? 
61, 62, Ci, C2 = Fg, dni := bn41 + bn42, B := (b1,...,bn,dn41, bn43), 
(wy ka, — GL(F 4,3), 
ee Irsa to k=l; 28; 
k)* x k) px k 
hí x wb +5 ) yib + 7S 
€o := = 55 1 yibi) I ô2bn+3, 
e1 := CODY L1 Yibi) + Crbn4i + C2bn+2 + ĝ2bn+3, 
return (param, B, {h} * TP Vicks, its) eA)» 
G!PSP (1>, n) : (param, B, B*) È G.(1,n + 3), 
U n oat, U nm iraa 
Y := (Y1, -,Yn) < F A OT, Us (tien) SEF AO h 


ô1, 62,01, C2 na Fy, dn+1 > bn41 + bn+2, B = (bı, os bn, dn+1, bn+3), 


i U 
Fori=1,...,n; (w (k ER E = GL(F,,3), 


For i=1,...,n; k=1,2,3; 
: N k 
ai= = 0B; +e Oh tS Pha Tf = ae +908, 
€o (= TODS yibi) ae Cibn+1 + C2bn+2 JE dabn+3, 
ey i= LODHEN uibi) + C1bn41 + C2bn+2 + d2bn+3, 


a lh k 
return (param, B, {Alc hizi, nik=1,2,3; Y €p). 


m =( (k) (k) 


Yibay2 Ti Yi HY Yi 
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Definition 2 (RDSP: Decisional Subspace Problem with Relevant 
Dual Vector Tuples). For all security parameter A € N, we define RDSP 
advantage of a probabilistic machine B as follows: 


AdvĒPSP (A) := 
[Pr [80> iP )=1| p Eg aA, n) | -Pr [80> P )1| p EGR (1A |: 


The RDSP assumption is: for any probabilistic polynomial-time adversary B, 
AdvĒPSP (A) is negligible in A. 


Definition 3 (IDSP: Decisional Subspace Problem with Irrelevant 
Dual Vector Tuples). The IDSP advantage of B, Advig?"(X), and the IDSP 
assumption are defined similarly as in Definition A 


In the generic DPVS model, basic operations in V,V*, and Gry, i.e., vector ad- 
ditions in Y and Y*, multiplication in Gry, pairing, and distortion maps w.r.t. A 
or A*, are given by “generic” algorithms that act independently of the represen- 
tations of vectors or group elements. 


Theorem 1. The advantages Adve??? (A) and Adve?” (A) are O(d/2%) for any 
adversary B in the generic DPVS model, where d is the maximum of the degrees 
of polynomials of formal variables (in the generic model game). 


We will describe the proof of Theorem [Jin the full version of this paper. 


Remark (Intuition behind the Assumptions) 
Here we informally explain the RDSP assumption by using a simplified one. 
In the simplified RDSP assumption, (hj,...,h*) is given to A in addition 
to (B := (b1,..-,8n42), Y := (Y1,---,Yn), eg), such that h := wb* + yibýņ1 
(i=1,...,n; w Fy) and eg := ô ($; yibi) + BCbn41 +ôz2bn+2 (8 È {0,1}, 
61,61, ¢ 2 F4). The simplified RDSP assumption is that it is hard for any adver- 
sary A, given (B, y,eg) along with (hj,...,h*), to correctly guess 6. (In the 
DSP assumption, only (B, y, eg) is given to A.) 

(h},...,h=) is added in the RDSP assumption in order to simulate the key 
generation oracle in the security proof of our encryption scheme as follows: for 


any Y := (v1)... ,Un) with Y - y #4 0, the simulator can compute a secret 
key k* for V such that k* := = vihi = = split 1 vib} ) + b41 = 
w' (So, vib) + bX, where w = oy. 

This secret key generation ee however, does not work for Y with 
w -y = 0, since FF cannot be computed. Therefore, (hj,...,h*) does not 


seem helpful to break the RDSP assumption, since a secret-key k* for w with 
“Y -y = 0” is of use to guess 8 by checking whether e(eg,k*) = 1 or not. 
Hence, the RDSP assumption seems to hold if the DSP assumption does. 
Similarly the IDSP assumption is introduced as a variant of the DSP assump- 
tion. In the RDSP and IDSP assumptions employed in this paper, we use a 
public element dn4+1 := bn+1 + bn+2 (in place of bn+1ı in basis B in the simplified 
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one), and 6,4; and b,+2 are not published. Such a modification is required for 
the IDSP assumption since the simplified IDSP assumption does not hold. 

In addition, in our RDSP (and IDSP) assumption, {A ®* h1... n; k=1,2,3 
is employed in place of {h¥}j<1,...n. This modification is introduced to re- 
randomize the coefficients for each key generation of the simulation by a random 
linear combination of no, ni? and no 


4 Definition of Hierarchical Predicate Encryption (HPE) 


This section defines hierarchical predicate encryption (HPE) for the class of 
hierarchical inner-product predicates and its security[] 

In a delegation system, it is required that a user who has a capability can dele- 
gate to another user a more restrictive capability. In addition to this requirement, 
our hierarchical inner-product encryption introduces a format of hierarchy 77 to 
define common delegation structure in a system. 

We call a tuple of positive integers P := (n, d; p1, ..- , Ha) S-t. po = 0 < py < 
U2 < +++ < pa = n a format of hierarchy of depth d attribute spaces. Let Xe 
(€ = 1,...,d) be the sets of attributes, where each 5) := FET" \ {0}. 


Let the hierarchical attributes X := U; (X1 x ... x X), where the union is 
=> 


a disjoint union. Then, for V; € Fi" "* \ {0}, the hierarchical predicate 
fa, Pe) On hierarchical attributes (@1,---,@n) € X is defined as follows: 
f, PAT 0) Eh) = 1 if l< h and Fi: W; = 0 for all i s.t. 1 < i < l. 


Let the space of hierarchical predicates F := {fxv | Va E F TEN 
{0}. We call h (resp. £) the level of (@1,..., Zh) (resp.(W1,..., Ve). 


Definition 4. Let fi := (n, d; p1, ..., pa) $-t. po =O < p < Ho <: < pa =n 
be a format of hierarchy of depth d attribute spaces. A hierarchical predicate 
encryption (HPE) scheme for the class of hierarchical inner-product predicates 
F over the set of hierarchical attributes X consists of probabilistic polynomial- 
time algorithms Setup, GenKey, Enc, Dec, and Delegate, for £= 1,...,d—1. They 
are given as follows: 


— Setup takes as input security parameter 1ò and format of hierarchy R, and 
outputs (master) public key pk and (master) secret key sk. 

— GenKey takes as input the master public key pk, secret key sk, and predicate 
vectors (U1,..., Ue). It outputs a corresponding secret key ee 

— Enc takes as input the master public key pk, attribute vectors (#1,..., Th), 
where 1 < h < d, and plaintext m in some associated plaintext space, msg. 
It returns ciphertext c. 

— Dec takes as input the master public key pk, secret key sky, a,), where 
1 < L< d, and ciphertext c. It outputs either plaintext m or the distinguished 
symbol L. 


t More general delegation structures (partial order structures) than tree hierarchical 
structures can be easily realized in our HPE scheme. See Remark in Section J] 
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— Delegate, takes as input the master public key pk, €-th level secret key 
ska, v.) and (€+1)-th level predicate vector Veyi. It returns (L+ 1)-th 
level secret key ski, We41)- 


A HPE scheme should have the following correctness property: for all cor- 


rectly generated pk and sky, 7); generate c = Enc(pk, m, (@1,..., Th)) 
and m’ := Dec(pk, sk... 7) €). If fi, aT ..., Eh) = 1, then m’ = m. 
Otherwise, m’ 4 m except for negligible probability. 

For f and f’ in F, we denote f’ < f if the predicate vector for f is a prefix 
of that for f’. For the following definition for key queries, see [22]. 


Remark: We will explain the hierarchical structure by using a small (toy) 
example that has three levels and each level consists of 2-dimensional space, 
i.e., 6-dimensional space is employed in total. That is, @ := (n,d;p1,..., Ha) 
= (6,3; 2,4,6) in this example. 

A user who possesses a secret key sk; in the top level, associated with the 
top level predicate vector U1 := (v1, v2), can delegate any value (say U2 := 
(v3,v4)) of the second level key sky such that the predicate vector for skə is 
(V1, V2). Similarly, a user who possesses a secret key in the second level, sk 
with (V1, V2), can delegate any value (say V3 := (vs, ve)) of the third level key 
skg with (T1, Vo, U3). 

Secret key sk, with V1, can decrypt a ciphertext associated with attribute 
vector (T1, (*,*), (#,*)) := ((a1, 22), (*, *), (x, *)) if 1- V1 = 0, where * de- 
notes an arbitrary value. Secret key sky with (7% 1, V2) can decrypt a ciphertext 
with attribute vector (21, o2, (*,*)) if @,- UW 1=O0and T2- Və = 0. However 
skg cannot decrypt a ciphertext with higher level (top level) attribute vector 
Tı := (1,22) (or (T1, (x, *), (*, *))). Therefore, the capability of a delegated 
key skz is more limited than the parent key sky. 

Hence, when (V1, V2) := ((v1, v2), (v3, v4)) is a predicate vector for a secret 
key, (71, U2) is considered to be (V1, U2, (0,0)), and when #  := (21,22) is 
an attribute vector for a ciphertext, @ 1 is considered to be (#1, (*,*), (*,*))), 
where (*, *) - (0,0) = 0 and (x, *)- V2 Æ 0 unless V2 = (0,0). 


Definition 5. A hierarchical inner-product predicate encryption scheme for hi- 
erarchical predicates F over hierarchical attributes X is selectively attribute-hiding 
(AH) against chosen plaintext attacks if for all probabilistic polynomial-time ad- 
versaries A, the advantage of A in the following experiment is negligible in the 
security parameter. 


1. A outputs challenge attribute vectors X®) := (zO no TULO = 


(ae a) ) 
Le cong © pay) 
2. Setup is run to generate keys pk and sk, and pk is given to A. 


3. A may adaptively makes a polynomial number of queries of the following 
type: 
— [ Create key ] A asks the challenger to create a secret key for a predicate 
f EF. The challenger creates a key for f without giving it to A. 
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— [ Create delegated key ] A specifies a key for predicate f that has already 
been created, and asks the challenger to perform a delegation operation 
to create a child key for f! < f. The challenger computes the child key 
without giving it to the adversary. 

— [ Reveal key ] A asks the challenger to reveal an already-created key for 
predicate f s.t. f(¥) = f(v@™) =0. 

Note that when key creation requests are made, A does not automatically see 
the created key. A sees a key only when it makes a reveal key query. 

4. A outputs challenge plaintexts mM ,m™. 

5. A random bit b is chosen. A is given c® = Enc(pk, m), 4), 

6. The adversary may continue to request keys for additional predicate vectors 
subject to the restrictions given in step 3. 

7. A outputs a bit b’, and succeeds if b = b. 


We define the advantage of A as the quantity Advi AM (A) := |Pr [b = b] — 1/2]. 


Remark: In Definition D] adversary A is not allowed to ask a key-query for 
(W1,..., Ve) such that fy, v.) (4) = 1 for some b € {0,1}, while in the 
security definition in [I6], such a key-query is allowed provided that m® =m 
and f, vo (8O) = f, va) = 1. This restriction is introduced to 
prove the security of the proposed HPE scheme only under the RDSP and IDSP 
assumptions. If we introduce another variant of the assumptions, we can relax 
this restriction. We will describe this case in the full version of this paper. 


pereg 


5 The Proposed HPE Scheme 


5.1 Key Idea in Constructing the Proposed HPE 


We will explain a key idea of the proposed HPE scheme. 

First, as a special (1-level) case of the proposed construction of HPE, we will 
show a predicate encryption (PE) construction for the inner-product predicate. 
Through the orthonormal property of (random) dual bases (B := (61,..., bn+3), 
= (bj,...,b%,3)) in DPVS, (q, V, V*, Gr, A, A*), (Sections E3} Bland B), the 
PE scheme for the (n-dimensional) inner-product predicate can be constructed 
as below, where V and V* are (n + 3)-dimensional spaces, the public parameter 
is (b1,...,0n,dn41 := bn+1 + bn+2,bn+3) as well as the parameters of DPVS, 
and the master secret key is (X and) B*. Ciphertext (c1,c2) for attribute @ := 
(£i; -,£n) € FF and ah m € Gr is fe = 01(a1b) +--+ + Tnbn) 4 


os +ô2bn+3 and cp := = gi, where 01, 62, C = F4. Secret key k* with predicate 
UW := (v,...,0) E Fp is k* := o(vibi +- -+ unb} A + nbt 4+ (1— n) by .9, where 


on = F,. If X- V = 0, plaintext m can be computed by m = c2/e(e1, k*), since 


e(c1,k*) = — ina Ì e(512ibi, ovib;)) - e(Cbn+1, Nb% 41) i e(Çbn+2, (1 = n) by, 4) = 
510 (do i 1 vi vi) +ont¢(1— n) õio(T T-V w)+E _ S 
IT = 97 = 97 

We now explain the key idea of the proposed HPE scheme by using a small 


(toy) example. Let the dimension of (predicate/attribute) vectors be 6, in which 
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there are three levels and each level has 2-dimensions, V and V* be 9-dimensional 
spaces, the public parameter be B := (b1, .. . , bg, d7, bg) as well as the parameters 
of DPVS, and the master secret key be (X and) B* := (bj,...,b5), where d7 := 
b7 + bg. 

Ciphertext (c1,c2) for attribute @ := (1, Z2, T3) := ((£1, £2), (£3, £4), 
(a5,%6)) € Fê and plaintext m is constructed as cı := ô1(£1b1 + z2b2) +--+ 


63(a5bs5 + zebe) + d7 + d4bg and cg := gsm, where 61,...,64,¢ Ps F,. If the 
attribute is a higher level such as X1 := (x1, £2), generate a modified attribute 


U ' 
Tt := ((@1, 0), (23,27), (af, ag )), where (ef ,ef,0,29) = Fj. Then, ci- 


phertext cı for attribute Yı is computed as ciphertext cı for the modified 
attribute at. E 

Top level secret key kj := (kÏ o,- --, k{ e), for predicate W := (v1, v2) € F2 
consists of three parts, kï 9, (kī 1, kï 2) and (kř 3.. .,kľ 6), where the first one 
is used for decryption of ciphertexts, the second one for re-randomization (of 
delegated key), and the last one for delegation. Each part is: kj o := 01,0(v1bj + 
v2b3) + nob; + (1 — 0) bs, kï j := 01,5(v1 bj + vab3) + nib; — njbš (j = 1,2), and 
ki, := 01,;(vibj + v2b}) + Wb; +n jbs —njb§ (j = 3,...,6), where o1,5,9 © Fy for 
j =0,...,6. The first one, kf o, can decrypt ciphertext (c1, c2) by c2/e(c1, kï o), 


since e(c1, kj o) = g$ if an attribute of c1 is ((£1, £2), (*, *), (*, *)) with (£1, £2): 
(v1, v2) = 0. To delegate a secret key for the 2nd level vector (v3, v4), o2,j(v3k{ 3+ 
vaky 4) is added to kīo (j = 0), O (j = 1,2,3), and otk}, (j = 5,6). To re- 
randomize the coefficients of (vib + v2b3), b= and bš in the delegated key, 


(aj kï 1 + aj,2k] 2) is also added. So, the delegated key (the second level key) 
se ; 

k3 := (k3 o,- ., k33, k35, k36), (where k3 o is for decryption, (k}1,...,43,3) 
for re-randomization, and (k35, k3) for delegation) is computed as kj) := 
kï o + (ao,1k{ 1 +a0,2k] 2) +02,0 (v3k{ 3 + vakž 4), k3 j = (ajiki 1 +aj,2kž 2) + 
2,;(U3kj 3 + vakř 4) (j = 1, 2,3), and k3 j := Ytki j + (ajiki a + aj kï o) +r 


02,5 (Uskj} 3 +vak{ 4) (j = 5, 6), where Qj, 1, Qj, 2, O2,j; ypt = Fa (j = 0, 1, 2, 3, 5, 6). 
Then, the distribution of the delegated key (by Delegate) is equivalent to that 
obtained by the key generation query (GenKey) except negligible probability 
(i.e., the simulation of ‘create delegated key query’ can be equivalent to that of 
‘create key query’.) 


+ 

In general, as for the ¢-th level secret key, k% := (Kpo,---; Ke e41» Be 4o 
Ke n); the first one, kj o, is used for decryption, the second part of components, 
ki 1,--+>Kee41, are for re-randomization (of a delegated key), and the last part 


of components, ky reper Ken are for delegation. 


5.2 HPE Scheme 


Setup(1*, È := (n,d;p1,-.-, Ha) : (param, B, B*) © Gop (1, n + 3), 
dea = bn41 + Dags, B:= (b1,..., bn, dn41,bn43), 
return sk := (X, B*), pk := (1°, param, B). 
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GenKey(pk, sk, (01, ---5 Ue) := (U1) +++) Up) +++ y (Upe-1415 ++ +s Upe)) : 
Cjin Fy for j=0,...,€+1etl,....n; i=1,...,8, 
Kp = roo m-141 Vib) + obrea + (1 no) Bn +2, 
kp j = = RODA merti vibi) + njbnyi — nibh 

for j=1,...,@4+1, 
ky j := 2 17, (St ürti vib; ) wb; 75 0h41 — 150n42 


for j = ue + 1,..., N, 
turn ki := (ki kž 91,3 ki 
return £ = ( 207779 Meet is Me otis) Zn): 
=> => 
Enc(pk,m € Gr, (2 1,..., Ge) = ((@1,-- 6 Lay )y ees (Lup tty + Buy): 
=> = U = p= U 
ee PE esas RR Pee Oi, egbai Onaas © = Fos 


cq i= SL 1e iep 141 Tibi) + Cdn41 + Sngsbn43, C2 := gm, 
return (cy, C2). 


Dec(pk, Kio, 1, 2) im := c2/e(c1, kz o), 


return m’. 
= 
Delegate,(p k, ky L Veyi := (Upeti, ace <M ees )) : 

ae a g. fory=0,. een ieee 
Keio = tE H an dt, + 001 i= aaa vik% a) 

* f= +1 * Me+ — 
iaai = ia OK + oj iep Vika) for j =1,..., 4+2, 

* £+1 * 4 * 
kjij = DA ajiki + o;( a ee vikp tat wk ; for j = pesitl,...,n, 

=> 

return kj.) c= (Kess0 -ee kigae ký; paitis Kein) 


[Correctness] Assume that ciphertext (c1, c2) is generated by Enc(pk,m, (#1, 
..,; T n)) and secret key kj, is generated by GenKey(pk, sk, (W1,..., V 2)). Note 


that e(c1, ky o) = gasses Tee < hand Zi- Ti =0 for all i s.t.1 < 


i < £ then e(c1, kjo) = g$. Otherwise, e(¢1,k7 o) is uniformly distributed. 
Hence, correctness holds for secret keys generated by GenKey, and it also holds 
for keys generated by Delegate by Claim [H 


Remark: A generalized delegation (not limited to a hierarchical delegation) 
system can be constructed on (1-level) PE described in the first part of Section 
EI where the parameters are the same as above. 

In the generalized delegatable PE scheme, secret key generation procedure 
GenKey(pk, sk, V1 := (v1,1,---,V1;n)) outputs kt = (KY geco Ki ran,1 Ki ran,2° 
ky dello KF del,n)> where ky „dec *— Tael); 1 V1,ib} jrit yati- dec) BF. 
ky „ran, j := Oran 3 Oa 1 U1, ib; ) T Tran jon. ~~ Nran,j On 49 j= E 1,2); ky „del, j = 
Odel,j (D. =1 l,i by) +b; + Nael jonni a Nael,j OF, +9 (j S eee n). 

To delegate secret key ki for v Va := (v2,1,---;U2n), Where V2 ¢ span(W1), 


Z3 
Delegate; (pk, k 1 V2) outputs k 2 = (k3 „dec? k3 ran,1> K3 ran,2> k5 del,1> 


niy 
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k3 deln): Here, k3 dec = k dec + Si Qdec,i Ki ran,i + T2,dec( imi 02,iKF get i) ; 
k3 ran,j = Yai Qran iki rani F Oz,ran,j Oia v2,iK] det,i) (j = 1,2,3); k3 ael,j = 
Wier Qaei KË rani + F2,del,5 (Sopa y Vaik? dee) + Y’Ri gej (J = 1,...,7). Further 
delegation for ki (L= 2,3,...) can be done in the same manner. 

Ciphertext (€1,c2) for attribute X := (x1,...,2p) and plaintext m € Gr is 
the same as that of the 1-level PE. Key k; can decrypt (c1, c2) if V1- Z = 0, 
and key ks can decrypt (€1,¢2) if (W1- X = 0) A (V2: X = 0). Namely the 
capability of delegated key E3 is more limited than that of its parent key Kx. 


=> 


In general, the ¢-th delegated secret key ki can decrypt (c1, c2) if (71-2 = 
0) A- A (Ve: T =0), where V; ¢span(W1,..., U j—1) for 2 < j <4 


5.3 Security 


Theorem 2. The proposed HPE scheme is selectively attribute-hiding against 
chosen plaintext attacks under the RDSP and IDSP assumptions. For any ad- 
versary A, there exist probabilistic machines Bı and B2, whose running times 
are essentially the same as that of A, such that for any security parameter A, 


Advi ANCA) < AdvEDSP()) + AdviPSP (A) + 3v/¢ 


where v is the number of adversary’s queries. 


Proof Outline: To prove the security, we employ five games, Game 0 (origi- 
nal selective-security game) to Game 4 whose advantage is 0, where, roughly, 
Game 1 is conceptually changed (the timing of challenger’s coin flips is changed) 
from Game 0, a delegated key query (i.e., a reveal query of an already-created 
delegated key) is replied by using GenKey (in place of Delegate) in Game 2, 
the plaintext part of the target ciphertext is randomized in Game 3, and the 
attribute vector part of the target ciphertext is randomized in Game 4. 

Since the distribution regarding each revealed key query in Game 2 is equiv- 
alent to that in Game 1 except with probability at most 3/q, the gap between 
Games 1 and 2 is bounded by 3v/q. 

To prove that the gap between Games 2 and 3 is bounded by the advantage of 
the RDSP assumption, target ciphertext (c1,c2) for m®) is generated by using 
eg from the RDSP assumption such that cı := eg + Cdn41 and c2 := gm), 
Then (c1,¢2) is a ciphertext in Game 2 when 3 = 0, and it is a ciphertext in 
Game 3 when @ = 1. The key generation oracle simulation can be perfectly 
Remark after Theorem[]). It can be done similarly to evaluate the gap between 
Games 3 and 4 (through the IDSP assumption). 


Proof of Theorem 
To prove Theorem 2] we consider the following five games. 
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Game 0: Original game (Definition J). 
Game 1: Game 1 is the same as Game 0 except the following procedures: 
1. When challenger C gets challenge attributes (a, eg ae a and (a, 
us T) in the first step of the game, C selects (challenge) bit b ~ 
{0,1}, and computes 


eee = (T is ,Od Ld), 
where h := h®),(@4,...,@n) = (@O,..., BP), (Baar... Fa) E 
pees eee ma and ôi,- Ôd Pia F, 


2. When C gets challenge plaintexts (m),m)) from adversary A, chal- 
lenger C computes (c1, C2) as below and returns it to A. 


cı := ye £7 bi + Cdngi + Ona3bna3, C2 = gem), 


where n43, C < Fo. 
Game 2: Game 2 is the same as Game 1 except the following procedures. 

1. When a create key query is issued by A, challenger C only records the 
specified predicates, and when a create delegated key query is issued, C 
only records the specified keys and predicates. In this step, C just records, 
but creates no corresponding keys. 

2. When a reveal key query is issued for a hierarchical (level-¢) predicate 
(7 1,..., Ue) which has been already recorded, C creates the queried 
key by using GenKey. In addition, there is a special rule such that 
(00,1; -, 00,2) i Fs is selected again if De 00,101 Tt -W = 0 in the 
computation process of GenKey. 

Game 3: Game 3 is the same as Game 2 except the target ciphertext (c1, c2) 
is generated as follows: 


cı = PiL] of bi + Ghana + Cobn42 + Ônt3bn43, C2 := gpm, 


U 
where On+3; Ç, Çi; ©) = Fy. 
Game 4: Game 4 is the same as Game 3 except the target ciphertext (c1, c2) 
is generated as follows: 


ci = OL, ubi + Gdn + Cobnge + Snt3bn43, C2 := gm), 
where 5n13,¢,41,€2 © F, and X := (u1,..., Un) £ F? \ {0}. 


Let Adv? (A) be AdviPEAH()) in Game 0, and Adv'i)(A) (i = 1,...,4) be the 
advantage of A in Game i. It is clear that Adv") (A) = Adv? (A), since it is a 


conceptual change. It is also clear that Adv") (A) = 0 by Lemma] 
We will show three lemmas (Lemmas [I] 2] B) that evaluate the gaps between 


pairs of Adv") (A) (i = 1,2,3,4). From these lemmas, we obtain Advt PEAH (A) = 
Adv? (A) = Adv) (A) < 33, [Adv (A) — Adv tH (A)| + Adv) (A) < AdvBDSP 
(A) +Advigs? (A) + 31/4. 
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Lemma 1. For any adversary A, |Adv}? (a) — Adv (A)| < 3v/q. 


Proof. The distribution of ki 741 generated by GenKey for a level-(¢+1) predicate 
is equivalent to that by the combination of GenKey for the level-¢ predicate and 
Delegate, except with probability 2/q, from Claim[]] Moreover, the special rule in 
Game 2 causes probability gap at most 1/q for each GenKey operation. Therefore, 
the revealed key distribution in Game 1 is equivalent to that in Game 2 except 
with probability at most (1 — (1 — 3/q)”) < 3v/q, since the number of delegate 
queries is upper-bounded by v. Hence (by using Shoup’s difference lemma), the 


difference of Adv (A) and Adv? (A) is upper-bounded by 3v/q. 


Claim 1. If k? is generated by GenKey(pk, sk,(W1,-.-, 0 e)), the distribution 


of k> 41 generated by Delegate(pk, ky, VU e411) is equivalent to that of hay gen- 
erated by GenKey(pk, sk, (V1, .., Ue, Ve41)) except with probability at most 2/q. 


Proof. The distribution of level-¢ key kj; (j = 1,...,@+ 1) is represented by 
that of the £+ 1 coefficients, (o4,1,-.-, oje, Ni), Of Dit, 41 Vibe; (6=1,.--,2) 
and by, ,, (and the coefficient, p, of b} in addition when j = pe +1,...,n), since 
the coefficient of b% ,5 is dependent of that of by. 4. 

Similarly, the distribution of level-(¢ + 1) key k7,,; (J = 1,...,/+ 2) is 
represented by that of the + 2 coefficients, (05,1,...,0j,041, Nj). 

When level-¢ key kj ; (j =1,...,£+1) is generated by GenKey(pk, sk, (W1,..., 


Ve)), (Tj 1,- -3 0j; Nj )j=1,...,.¢+1 iS uniformly distributed. 
If coefficient matrix (0;1,...,0),0,1;)j=1,....e+1 ((£ + 1) x (+ 1) matrix) of 
(ke ;)j= 1,... 241 kA regular and ih 4 0, then the coefficients, (07,1, ---,09,¢41,7;); 


of Delegate(pk, Ki, V e+1) is uniformly distributed, i.e., Delegate(pk, Rive. ) 
is equivalently distributed as GenKey(pk, sk, (1,..., Ue41)). 

Here, (03,1, --+ 55,0, j)j=1,...¢+1 ((€+ 1) x (£+ 1) matrix) of (kj ;)j=1 
is regular and w Æ 0 except with probability at most 2/q. 


Lemma 2. For any adversary A, there exists a probabilistic machine B,, whose 
running time is essentially the same as that of A, such that for any security 
parameter X, JAdv) (A Xd) - Adv (A)| = Adv ane ). 


Proof. In order to prove Lemma B| we construct a probabilistic machine 6, 
against the RDSP problem by using any adversary A in a security game (Game 2 
or 3) as a black box as follows: 


1. Bı is given RDSP instance (param, B, {he?*, 7 } nt... nsk=1,2,35 y, eg). 
2. Bı plays a role of challenger C in the security game against adversary A. 


3. When B: (or challenger C) gets challenge attributes (7 {®,... ZO) and 
(ZO, aa TO) in the first step of the game, Bı selects (challenge) bit 


bt {0,1}, and computes 


(Gy avg) = (01 @aysi+5 da a) 
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where h:= nO), ( T1, Ng Th) = ( rA tag a ( T h+» éx Ta) App ee 


xe x FP M1 and 81, ., 8a — Fo. 

Let (m3) € {H € GL(n,F) | Y = X+- I, IH" = I}, and I = 
(m33) = (mij) T) +. Note that &+ = Y - M*. Public parameter pk is then 
calculated as follows and B, returns pk to A: 


bj := 1 Tjebo, OF = we Meee (9 = Lassie) 
B _ (b, gaS , bn, dn41, Beta) pk = Ga param, B). 


4. When a reveal key query is issued for a hierarchical (level-€) predicate 
(T1, ..., Ue) which has been already recorded, Bı answers as follows: for 
j=0,...,4+1,ue+1,...,n, By calculates 


D+H. (at +). > > 
vj = (vza Sei Oe ig) = (05,101,---,0;2V g) (1) 
U Fe * 
where gj 1,...,0je Fy. Then, Bı calculates and returns kj := (kj o,---, 
kipipi Klep ký „) using {h; T;  } in the RDSP instance: 


—_ 3 He + n * (k) 
bo = ket 40,k isi Voi ie Ti oTe » 
x p(k)* 
Kio = 0o o> 1 90,6 Doi 1 Uda Dope 1Tioħe » 
For j = 1,...,4+1,ue+1,... CA 
o3 He yt x (k) 
a= yi Qj,k,s pao 1°54 i Mi oTo + 
_ o3 He + n* (k) 
st et 4j,k,s DE 1%5,i 2vo=1 T ‘oho 
kř j = Oj fja zi Off 2 
For j = pet+1,...,n 


For i=1,..., Me,J; 


pi = Deere Doar Mere ME = Daa ük Dg= whe”, 
a= p (Lew). ky = hey + m} = z DL wpm}, 
where aok, ajk, üp < Fg for j =1,...,€+1,ue+1,...,nk =1,2,3;8 = 
1,2. 
If 0o = 0, {00,t, a0,k = Fa}k=1,2 Bits 1,...,¢ 18 selected again. For j = we + 
at eG v, Tpi = 0, {0j t, ük = Fo }x=1,2,3;t=1,...,¢ İS selected again. 
5. he B; (or C) gets challenge plaintexts (m, me )) (from A), Bı calculates 
and returns (€1,¢2) S.t. C1 := eg + Cdn41 and cz := gm) using eg in the 
RDSP instance, ¢, and m), where ¢ = Fy. 
6. After the encryption query, GenKey oracle simulation for a reveal key query 


is executed as above. 
7. A outputs bit b'. If b = b', Bı outputs 8’ := 1. Otherwise, Bı outputs 8’ := 


To prove Lemma] we show Claims] B} and 
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Claim 2. Public parameter pk generated in step 3 above has the same distribu- 
tion as that in Game 2 (and Game 3). 


II 0 
0 I3 
IT and the identity matrix I3. Then basis (by, ee ‘bn, bn+1, On+2, bn43) of V is 
obtained from basis B by the linear transformation determined by D. Hence, its 
distribution is uniform. Therefore, B = (b1,...,6n,dn+1, bn+3) in step 3 has the 
same distribution as that in Game 2 (and Game 3). 


Proof. Let D := ( ) be square (n + 3) x (n + 3) matrix composed of 


Claim 3. Secret key ks generated in steps 4 and 6 above has the same distri- 
bution as that in Game 2 (and Game 3). 


Proof. First, we verify that basis (bi, fale, bi 11 0% ,5,67,3) of V* is obtained 
by the linear transformation (D™)~!, where D is defined in the proof of Claim B} 
That is, it is dual orthonormal to Ten (b, pa ‘bn, bn+1, On+2, bn4+3). Therefore, 
we can consider ky j Wert. this dual orthonor al basis. 


Secret key kj generated in steps 4 and 6 is Os Ce eal) DE vo, + bt 


+05 '01b% 1 +03 02b% 12, where 01 := OD ao ryt wt. Tt, b2 := Ss Q0,k 
yet 2+, and bo = 01 + b2. Let o := Os Ao, nw” J), Then, ø, 1, 2 are 
independently uniform, since ag,, are independently uniform, and 65, 1091+05 19, = 
1. Also, from (I, the coefficients of $544 mirt vib in kj, for each 1 < t < lare all 
uniformly and independently distributed. ‘Therefore, generated kj 9 has the same 
distribution as in Game 2 and Game 3. 

Similarly, for 7 = 1,. pee l, ue+1,...,n, the j-th key ky j has independently 


uniform coefficients w.r.t. $ i2 ae pd vib" for each 1 < t < £, and the sum of the 
coefficients of b* %41 and bs 42 İS zero. B 

Finally, we investigate the distribution of the n of b} in kj; for 
j=pet1,...,n. The additional term mj — zj bar op ;m* is 


ow : 3 = HA 
Zj st, tinea) ier viibi + 2 ain) b; 
+ (Bag = zi Diir 7 1a) Bhar + (ag — zi De v iezi) bha (2) 


where g1, := (Si w TT, P2i:= Os arf?) x} and pi = p1, t P2,- 
Therefore, for j = ue + 1,...,n, the sum of the coefficients of b}, and bf, 5 in 
(2) is zero, and the coefficients of b} in ky j are common, 4 G,pw), which is 
uniformly distributed. 


Claim 4. If 3 = 0, the distribution of (e1,c2) generated in step 5 is the same 
as that in Game 2. If B = 1, the distribution of (ce1,c2) generated in step 5 is 
the same as that in Game 3. 


Proof. If 6 =0, C1 =å) -1 Yibi + Cdn41 + ĝ2bn+3 =å); =1 [i +b; + Cdn+1 + 
d26n43 and co := gm), This is the target ciphertext in Game 2 with pk := 
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(14, param, B). If 6 = 1, c1 = 6) Oh af bi + (C+) bn gi + (C+ C2)bn424 dabnys 
and cg i= gm), Because ¢ + ¢1,¢ + ¢2, and Ç are independently uniform, this 


is the target ciphertext in Game 3 with pk := (1°, param, B). 


From Claims 2] BI and H when 8 = 0, the advantage of A in the above game is 
equal to that in Game 2, i.e., Adv? (A), and also is equal to Pro := 
Pr [Bi(1>, p) +1| p © gRPSP(1>, n)]. Similarly, when 8 = 1, we see that the ad- 
vantage of A in the above game is equal to Adv?) (A), and also is ee to Pry := 
Pr Bw, p)—1 | p E GRDSP (1A, n)| Therefore, |Adv?) (4) —Adv®) (A)| = |Pro— 
Pri| = Adv.’ (A). This completes the proof of Lemma B] 


Lemma 3. For any adversary A, there exists a probabilistic machine B2, whose 
running time is essentially be oe as that of A, such that for any security 


parameter X, |Adv) (A) — Adv? (A)| = AdviDSP(). 


Proof. Lemma Blis similarly proved as Lemma P] The proof will be given in the 
full version of this paper. 


Lemma 4. For any adversary A, Adv) (A) = 0. 


Proof. The value of b is independent from the adversary’s view in Game 4. Hence, 
(Ay) — 
Adv% (à) = 0. 
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Abstract. Public-key encryption schemes rely for their IND-CPA secu- 
rity on per-message fresh randomness. In practice, randomness may be 
of poor quality for a variety of reasons, leading to failure of the schemes. 
Expecting the systems to improve is unrealistic. What we show in this 
paper is that we can, instead, improve the cryptography to offset the 
lack of possible randomness. We provide public-key encryption schemes 
that achieve IND-CPA security when the randomness they use is of high 
quality, but, when the latter is not the case, rather than breaking com- 
pletely, they achieve a weaker but still useful notion of security that we 
call IND-CDA. This hedged public-key encryption provides the best pos- 
sible security guarantees in the face of bad randomness. We provide sim- 
ple RO-based ways to make in-practice IND-CPA schemes hedge secure 
with minimal software changes. We also provide non-RO model schemes 
relying on lossy trapdoor functions (LTDFs) and techniques from deter- 
ministic encryption. They achieve adaptive security by establishing and 
exploiting the anonymity of LTDFs which we believe is of independent 
interest. 


1 Introduction 


Cryptography ubiquitously assumes that parties have access to sufficiently good 
randomness. In practice this assumption is often violated. This can happen be- 
cause of faulty implementations, side-channel attacks, system resets or for a 
variety of other reasons. The resulting cryptographic failures can be spectacu- 
lar PPRS]. What can we do about this? One answer is that system de- 
signers should build “better” systems, but this is clearly easier said than done. 
The reality is that random number generation is a complex and difficult task, 
and it is unrealistic to think that failures will never occur. We propose a different 
approach: designing schemes in such a way that poor randomness will have as 
little as possible impact on the security of the scheme in the following sense. 
With good randomness the scheme achieves whatever (strong) security notion 
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one is targeting, but when the same scheme is fed bad (even adversarially cho- 
sen) randomness, rather than breaking completely, it achieves some weaker but 
still useful notion of security that is the best possible under the circumstances. 
We call this “hedged” cryptography. 

Previous work by Rogaway [32], Rogaway and Shrimpton [B3], and Kamara 
and Katz 27] considers various forms of hedging for the symmetric encryption 
setting. In this paper, we initiate a study of hedged public-key encryption. We 
address two central foundational questions, namely to find appropriate defini- 
tions and to efficiently achieve them. Let us now look at all this in more detail. 


THE PROBLEM. Achieving the standard IND-CPA notion of privacy 23] requires 
the encryption algorithm to be randomized. In addition to the public key and 
message, it takes as input a random string that needs to be freshly and indepen- 
dently created for each and every encryption. 

Weak (meaning, low-entropy) randomness does not merely imply a loss of 
theoretical security. It can lead to catastrophic attacks. For example, weak- 
randomness based encryption is easily seen to allow recovery of the plaintext 
from the ciphertext for the quadratic residuosity scheme of as well as the 
El Gamal encryption scheme BI]. Brown presents such an attack on RSA- 
OAEP with encryption exponent 3. Ouafi and Vaudenay present such 
an attack on Rabin-SAEP [I3]. We present an alternative attack in [Z]. 

The above would be of little concern if we could guarantee good randomness. 
Unfortunately, this fails to be true in practice. Here, an “entropy-gathering” 
process is used to get a seed which is then stretched to get “random” bits for 
the application. The theory of cryptographically strong pseudorandom number 
generators [LI] implies that the stretching can in principle be sound, and extrac- 
tors further allow us to reduce the requirement on the seed from being uniformly 
distributed to having high min-entropy, but we still need a sufficiently good seed. 
(No amount of cryptography can create randomness out of nothing!) In prac- 
tice, entropy might be gathered from timing-related operating system events or 
user keystrokes. As evidence that this process is error-prone, consider the recent 
randomness failure in Debian Linux, where a bug in the OpenSSL package led 
to insufficient entropy gathering and thence to practical attacks on the SSH 
and SSL protocols. Other exploits include B5. 


THE NEW NOTION. The idea is to provide two tiers of security. First, when the 
“randomness” is really random, the scheme should meet the standard IND-CPA 
notion of security. Otherwise, rather than failing completely, it should gracefully 
achieve some weaker but as-good-as-possible notion of security. The first impor- 
tant question we then face is to pick and formally define this fallback notion. 
Towards this, we begin by suggesting that the message being encrypted may 
also have entropy or uncertainty from the point of view of the adversary. (If not, 
what privacy is there to be preserved by encryption?) We propose to harvest this. 
In this regard, the first requirement that might come to mind is that encryption 
with weak (even adversarially-known) randomness should be as secure as deter- 
ministic encryption, meaning achieve an analog of the PRIV notion of [6]. But 
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achieving this would require that the message by itself have high min-entropy. 
We can do better. Our new target notion of security, that we call Indistinguisha- 
bility under a Chosen Distribution Attack (IND-CDA), asks that security is 
guaranteed as long as the joint distribution of the message and randomness has 
sufficiently high min-entropy. In this way, we can exploit for security whatever 
entropy might be present in the randomness or the message, and in particular 
achieve security even if neither taken alone is random enough. 

Notice that if the message and randomness together have low min-entropy, 
then we cannot hope to achieve security, because an adversary can recover the 
message with high probability by trial encryption with all message-randomness 
pairs that occur with a noticeable probability. In a nutshell, our new notion 
asks that this necessary condition is also sufficient, and in this way is requiring 
security that is as good as possible. 

We denote by H-IND our notion of hedged security that is satisfied by encryp- 
tion schemes that are secure both in the sense of IND-CPA and in the sense of 
IND-CDA. 


ADAPTIVITY. Our IND-CDA definition generalizes the indistinguishability-style 
formalizations of PRIV-secure deterministic encryption B2], which in turn ex- 
tended entropic security [I8]. But we consider a new dimension, namely, adaptiv- 
ity. Our adversary is allowed to specify joint message-randomness distributions 
on to-be-encrypted challenges. The adversary is said to be adaptive if these 
queries depend on the replies to previous ones. Non-adaptive H-IND means IND- 
CPA plus non-adaptive IND-CDA and adaptive H-IND means IND-CPA plus 
adaptive IND-CDA. 

Non-adaptive IND-CDA is a notion of security for randomized schemes that 
becomes identical to PRIV in the special case that the scheme is deterministic. 
Adaptive IND-CDA, when restricted to deterministic schemes, is an adaptive 
strengthening of PRIV that we think is interesting in its own right. As a conse- 
quence of the results discussed below, we get the first deterministic encryption 
schemes that achieve this stronger notion. 


SCHEMES WITH RANDOM ORACLES. Our random oracle (RO) model schemes and 
their attributes are summarized in the first two rows of the table of Figure [] 
Both REwH1 and REwH2 efficiently transform an arbitrary (randomized) IND- 
CPA scheme into a H-IND scheme with the aid of the RO. They are simple ways 
to make in-practice encryption schemes H-IND secure with minimal software 
changes. REwH1 has the advantage of not changing the public key and thus not 
requiring new certificates. It always provides non-adaptive H-IND security. It 
provides adaptive H-IND security if the starting scheme has the extra property 
of being anonymous in the sense of [4]. Anonymity is possessed by some deployed 
schemes like DHTES [I], making REwH1 attractive in this case. But some in- 
practice schemes, notably RSA ones, are not anonymous. If one wants adaptive 
H-IND security in this case we suggest REwH2, which provides it assuming only 
that the starting scheme is IND-CPA. It does this by adding a randomizer to 
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Non-adaptive H-IND Adaptive H-IND 


REwH1 IND-CPA IND-CPA + ANON-CPA 
REwH2 IND-CPA IND-CPA 


RtD IND-CPA, PRIV IND-CPA, (u-)LTDF 


PtD (u-)LTDF (u-)LTDF 


Fig. 1. Table entries for the first two rows indicate the assumptions made on the (ran- 
domized) encryption scheme that underlies the RO-model hedged schemes in question. 
The entries for standard model scheme RtD are the assumptions on the underlying 
randomized and deterministic encryption schemes, respectively, and for PtD, on the 
underlying deterministic encryption scheme, which is the only primitive it uses. 


the public key, so it does require new certificates. The schemes are extensions of 
the EwH deterministic encryption scheme of [6] and similar to 2Q]. 


SCHEMES WITHOUT RANDOM ORACLES. It is easy to see that even the existence 
of a non-adaptively secure IND-CDA encryption scheme implies the existence of 
a PRIV-secure deterministic encryption (DE) scheme. Achieving PRIV without 
ROs is already hard. Indeed, fully PRIV-secure DE without ROs has not yet 
been built. Prior work, however, does show how to construct PRIV-secure DE 
without ROs for block sources [12]. (Messages being encrypted have high min- 
entropy even conditioned on previous messages.) But H-IND introduces three 
additional challenges: (1) the min-entropy guarantee is on the joint message- 
randomness distribution rather than merely on the message; (2) we want a single 
scheme that is not only IND-CDA secure but also IND-CPA-secure; and (3) the 
adversary’s queries may be adaptive. 

We are able to overcome these challenges to the best extent possible. We pro- 
vide schemes that are H-IND-secure in the same setting as the best known PRIV 
ones, namely, for block sources, where we suitably extend the latter notion to 
consider both randomness and messages. Furthermore, we achieve these results 
under the same assumptions as previous work. 

Our standard model schemes and their attributes are summarized in the last 
two rows of the table of Figure[]] RtD is formed by the generic composition 
of a deterministic scheme and a randomized scheme and achieves non-adaptive 
H-IND security as long as the base schemes meet their regular conditions. (That 
is, the former is PRIV-secure for block sources and the latter is IND-CPA.) 
Adaptive security requires that the deterministic scheme be a u-LTDF. (A lossy 
trapdoor function whose lossy branch is a universal hash function BIMJ.) PtD is 
simpler, merely concatenating the message to the randomness and then applying 
deterministic encryption. It achieves both non-adaptive and adaptive H-IND 
under the assumption that the deterministic scheme is a u-LTDF. For both 
schemes, the universality assumption on the LTDF can be dropped by modifying 
the scheme and using the crooked leftover hash lemma as per [1]. (This is why 
the “u” is parenthesized in the table of Figure [) 
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ANONYMOUS LTDFs. Also of independent interest, we show that any u-LTDF 
is anonymous. Here we refer to a new notion of anonymity for trapdoor functions 
that we introduce, one that strengthens the notion of (J. This step exploits an 
adaptive variant of the leftover hash lemma of [26]. 

Why anonymity? It is exploited in our proofs of adaptive security. Our new 
notion of anonymity for trapdoor functions is matched by a corresponding one 
for encryption schemes. We show that any encryption scheme that is both 
anonymous and non-adaptive H-IND secure is also adaptively H-IND secure. 
Anonymity of the u-LTDF, in our encryption schemes based on the latter prim- 
itive, allows us to show that these schemes are anonymous and thereby lift their 
non-adaptive security to adaptive. 


RELATED WORK. In the symmetric setting, several works have recognized and 
addressed the problem of security in the face of bad randomness. Concern over 
the quality of available randomness is one of Rogaway’s motivations for introduc- 
ing nonce-based symmetric encryption [32], where security relies on the nonce 
never repeating rather than being random. Rogaway and Shrimpton [33] provide 
a symmetric authenticated encryption scheme that defaults to a PRF when the 
randomness is known. 

Kamara and Katz provide symmetric encryption schemes secure against 
chosen-randomness attack (CRA). Here the adversary can obtain encryption un- 
der randomness of its choice but privacy is only required for messages encrypted 
with perfect, hidden randomness. Entropy in the messages is not considered or 
used. We in contrast seek privacy even when the randomness is bad as long as 
there is compensating entropy in the message. Also we deal with the public key 
setting. 

Many works consider achieving strong cryptography given only a “weak ran- 
dom source” B86. This is a source that does have high min-entropy but may 
not produce truly random bits. They show that many cryptographic tasks in- 
cluding symmetric encryption [28], commitment, secret-sharing, and zero knowl- 
edge are impossible in this setting. We are not in this setting. We do assume 
a small amount of initial good randomness to produce keys. (This makes sense 
because it is one-time and because otherwise we can’t hope to achieve anything 
anyway.) On the other hand our assumption on the randomness available for en- 
cryption is even weaker than in the works mentioned. (We do not even assume 
it has high min-entropy.) Our key idea is to exploit the entropy in the mes- 
sage, which is not done in 28l6fl4]. This allows us to circumvent their negative 
results. 

Waters independently proposed hedge security as well as the PtD construction 
as a way to achieve it B5]. 


2 Preliminaries 


NOTATION. Vectors are written in boldface, e.g. x. If x is a vector then |x| denotes 
its length and x[i] denotes its i” component for 1 < i < |x|. We say that x is 
a vector over D if x[i] € D for all 1 < i < |x|. Throughout, k € N denotes the 
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security parameter and 1% its unary encoding. Unless otherwise indicated, an 
algorithm is randomized. The set of possible outputs of algorithm A on inputs 
21,U2,... is denoted [A(x1,x2,...)]. “PT” stands for polynomial-time. 


GAMES. Our security definitions and proofs use code-based games [9], and so we 
recall some background from [9]. A game (look at Figure B]for examples) has an 
Initialize procedure, procedures to respond to adversary oracle queries, and a 
Finalize procedure. A game G is executed with an adversary A as follows. First, 
Initialize executes, and its outputs are the inputs to A. Then A executes, its 
oracle queries being answered by the corresponding procedures of G. When A 
terminates, its output becomes the input to the Finalize procedure. The output 
of the latter is called the output of the game, and we let G4s y denote the 
event that this game output takes value y. Our convention is that the running 
time of an adversary is the time to execute the adversary with the game that 
defines security, so that the running time of all game procedures is included. 


PUBLIC-KEY ENCRYPTION. A public-key encryption (PKE) scheme is a tuple 
of PT algorithms AE = (P,K,E,D) with associated message length parameter 
n(-) and randomness length parameter p(-). The parameter generation algorithm 
P takes as input 1% and outputs a parameter string par. The key generation 
algorithm K takes input par and outputs a key pair (pk, sk). The encryption 
algorithm £ takes inputs pk, message m € {0,1}"") and coins r € {0,1}? 
and returns the ciphertext denoted E(pk,m; r). The deterministic decryption 
algorithm D takes input sk and ciphertext c and outputs either L or a message 
in {0,1}"*), For vectors m,r with |m| = |r| = v we denote by €(pk,m; r) the 
vector (E(pk, m[1]; r[1]),...,E(pk, m[v]; r[v])). We say that AE is deterministic 
if E is deterministic. (That is, p(-) = 0.) 

We consider the standard IND-CPA notion of security, captured by the game 
IND 4e where AE = (P,K,€,D) is an encryption scheme. In the game, Initialize 
chooses a random bit b, generates parameters par —s P(1*) and generates a key 
pair (pk, sk) —s K(par) before returning pk to the adversary. Procedure LR, on 
input messages mo and mı, returns c +s E(pk, mp). Lastly, procedure Finalize 
takes as input a guess bit b’ and outputs true if b = b’ and false otherwise. An 
IND-CPA adversary makes a single query (mo, mı) to LR with |mo| = |mj|. 
For IND-CPA adversary A we let Advite a(k) = 2-Pr [IND4e k > true|]—1. 
We say AE is IND-CPA secure if Advis AC) is negligible for all PT IND-CPA 
adversaries A. 


SOURCES. We generalize the notion of a source to consider a joint distribution on 
the messages and the randomness with which they will be encrypted. A t-source 
(t > 1) with message length n(-) and randomness length p(-) is a probabilistic 
algorithm M that on input 1” returns a (t+1)-tuple (mo,...,m4—1,r) of equal- 
length vectors, where mo,...,my4—1 are over {0,1}") and r is over {0,1}. 
We say that M has min-entropy ju(-) if 


Pr | (mp[i],r[é]) = (m,r) ] < 2-6 
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for all k € N, all b € {0...,t— 1}, all é and all (m,r) € {0,1}"™ x {0, 1} 6), 
We say it has conditional min-entropy ju(-) if 


Pr | (meli, r[é]) = (m,r) | Yj < i (melj], rli] = Cm’ [i], 2’ J) | < 27k 


for all k € N, all b € {0...,¢— 1}, all @, all (m,r), and all vectors m’,r’. A 
t-source with message length n(-), randomness length p(-), and min-entropy u(-) 
is referred to as a (u,n, p)-mr-source when t = 1 and p(-) > 0; a (u, n)-m-source 
when t = 1 and p(-) = 0; a (u,n, ~)-mmr-source when t = 2 and p(-) > 0; 
and (u, n)-mm-source when t = 2 and p(-) = 0. Each “m” indicates the source 
outputting one message vector and an “r” indicates a randomness vector. When 
the source has conditional min-entropy p(-) we write block-source instead of 
source for each of the above. A v(-)-vector source outputs vectors of size v(k) for 
all k. 


UNIVERSAL HASH FUNCTIONS. A family of functions is a tuple H = (P,K, F) 
with associated message length n(-). It is required that the domain of F(K, -) 
is {0,1}” for every k, every par € [P(1*)], and every K € [K(par)]. We say 
that H is universal if for every k, all par € [P(1*)], and all distinct 21,22 € 
{0,1}"), the probability that F(K, a1) = F(K, x2) is at most 1/|R(par)| where 
R(par) = { F(K,«) : K € [K(par)] and x € {0,1}"} and the probability is over 
K s K(par). 


Lossy TRAPDOOR FUNCTIONS (LTDFs). To a deterministic PKE scheme (re- 
call that a family of injective trapdoor functions and a deterministic encryption 
scheme are, syntactically, the same object) AE = (Pa, Ka, Ea, Da) with message 
length ng(-) we can associate an (na, ¢)-lossy key generator Kı. This is a PT 
algorithm that, on input par, outputs a value pk for which the map €q(pk,-) 
has image size at most 2”¢(")-“), The parameter £ is called the lossiness of the 
lossy key generator. We associate to AE, lossy key generator K, and a LOS ad- 
versary A the function Adv'th <, a(k) =2. Pr [ LOS fe Kik = true | — 1, where 
game LOS 4e,x, works as follows. Initialize chooses a random bit b and gener- 
ates parameters par —s Pq(1*), if b = 0 runs (pk, sk) —s Ka(par) and if b = 1 
runs pk +s K)(par). It then returns pk (to the adversary A). When A finishes, 
outputting guess b’, Finalize returns true if b = b’. We say Kı is universal- 
inducing if H = (Pa, Ki, Ea) is a family of universal hash functions with message 
length na. 

A deterministic encryption scheme AE is a (na, @)-lossy trapdoor function 
(LTDF) if there exists a (na, ¢)-lossy key generator such that Advi x,,a() is 
negligible for all PT A. We say it is a universal (nq, ¢)-lossy trapdoor function 
(u-LTDF) if in addition K; is universal-inducing. 

Lossy trapdoor functions were introduced by Peikert and Waters BI], and can 
be based on a variety of number-theoretic assumptions, including the hardness of 
the decisional Diffie-Hellman problem, the worst-case hardness of lattice prob- 
lems, and the hardness of Paillier’s composite residuosity problem BIBA. 
Boldyreva et al. observed that the DDH-based construction is universal. 
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proc. Initialize(1*): proc. LR(M): proc. RevealPK(): 
par —s P(1*) If pkout = true then pkout — true 

(pk, sk) —s K(par) Ret L Ret pk 

b —s {0,1} (mo, mı, r) —s M(1*) 
Ret par Ret E(pk, m»; r) 


proc. Finalize(b’): 
Ret (b = b) 


Fig. 2. Game CDA A£,k 


3 Security against Chosen Distribution Attack 


Let AE = (P,K,E,D) be an encryption scheme. A CDA adversary is one whose 
LR queries are all mmr-sources. Game CDA 4e of Figure] provides the adver- 
sary with two oracles. The advantage of CDA adversary A is 


Adve 4(k) =2-Pr[CDA4¢ p > true] —1. 


In the random oracle model we allow all algorithms in Game CDA to access the 
random oracle; importantly, this includes the mmr-sources. 


Discussion. Adversary A can query LR with an mmr-source of its choice, an 
output (mo, m1, r) of which represents choices of message vectors to encrypt and 
randomness with which to encrypt them. (An alternative formulation might have 
CDA adversaries query two mr-sources, and distinguish between the encryption 
of samples taken from one of these. But this would mandate that schemes ensure 
privacy of messages and randomness.) This allows A to dictate a joint distri- 
bution on the messages and randomness. In this way it conservatively models 
even adversarially-subverted random number generators. Multiple LR queries 
are allowed. In the most general case these queries may be adaptive, meaning 
depend on answers to previous queries. 

Given that multiple LR queries are allowed, one may ask why an mmr-source 
needs to produce message and randomness vectors rather than simply a single 
pair of messages and a single choice of randomness. The reason is that the 
coordinates in a vector all depend on the same coins underlying an execution of 
M, but the coins underlying the execution of the sources in different queries are 
independent. 

Note that Initialize does not return the public key pk to A. A can get it 
at any time by calling RevealPK but once it does this, LR will return L. 
The reason is that we inherit from deterministic encryption the unavoidable 
limitation that encryption cannot hide public-key related information about the 
plaintexts [6]. (When the randomness has low entropy, the ciphertext itself is 
such information.) 

As we saw in the previous section, no encryption scheme is secure when 
both messages and randomness are predictable. Formally, this means chosen- 
distribution attacks are trivial when adversaries can query mmr-sources of low 
min-entropy. Our notions (below) will therefore require security only for sources 
that have high min-entropy or high conditional min-entropy. 
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EQUALITY PATTERNS. Suppose A makes a query M which returns (mọ, m,r) = 
((a, a), (a,a’), (r,r)) for some a Æ a’ and random r. Then it can win trivially be- 
cause the (two) components of the returned vector c are equal if b = 0 and 
unequal otherwise. This limitation, again inherited from deterministic encryp- 
tion [Ø], is inherent. To capture it we associate to an mmr-source M an equality- 
pattern probability 


C(k) =Pr | eq((mo,r), (mj,r)) =0 : (mo, mı, r) =s M(1*) ] 
where eq((x1, x2), (y1,y2)) is 1 if for all 7,7 
(x1 [¢], x2[¢]) = (xily], xaly]) iff (vile), y2li]) = (vill, yet) ; 


and 0 otherwise. We point out that LR queries that are mmr-block-sources 
(and not, just, mmr-sources) with high conditional min-entropy have negligible 
equality-pattern probability. 


NOTIONS. We can assume (without loss of generality) that a CDA adversary 
makes a single RevealPK query and then no further LR queries. We say A is 
a (u,n, p)-adversary if all of its LR queries are (u,n, e)-mmr-sources. We say 
that a PKE scheme A€ with message length n(-) and randomness length p(-) is 
IND-CDA secure for (u, n, 2)-mmr-sources if for all PT (u, n, p) adversaries A the 
function Adv‘? ,(-) is negligible. Scheme AE is H-IND secure for (u, n, p)-mmr- 
sources if it is IND-CPA secure and IND-CDA secure for (u, n, p)-mmr-sources. 
We can extend these notions to mmr-block-sources by restricting to adversaries 
that query mmr-block-sources. 


ON ADAPTIVITY. We can consider non-adaptive IND-CDA security by restrict- 
ing attention in the notions above to adversaries that only make a single LR 
query. Why do we not focus solely on this (simpler) security goal? The standard 
IND-CPA setting (implicitly) provides security against multiple, adaptive LR 
queries. This is true because in that setting a straightforward hybrid argument 
shows that security against multiple adaptive LR queries is implied by security 
against a single LR query BIB]. We wish to maintain the same standard of adap- 
tive security in the IND-CDA setting. Unfortunately, in the IND-CDA setting, 
unlike the IND-CPA setting, adaptive security is not implied by non-adaptive 
security. In short this is because a CDA adversary necessarily cannot learn the 
public key before (or while) making LR queries. To see the separation, consider 
a PKE scheme that appends to every ciphertext the public key used. This will 
not affect the security of the scheme when an adversary can only make a single 
query. However, an adaptive CDA adversary can query an mmr-source, learn 
the public key, and craft a second source that uses the public key to ensure 
ciphertexts which leak the challenge bit. 

Given this, our primary goal is the stronger notion of adaptive security. That 
said, non-adaptive hedge security is also relevant because in practice adap- 
tive adversaries might be rare and (as we will see in Section) one can find 
non-adaptively-secure schemes that are more efficient and/or have proofs under 
weaker assumptions. 
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ADAPTIVE PRIV. A special case of our framework occurs when the PKE scheme 
AE being considered has randomness length p(k) = 0 for all k (meaning also that 
adversaries query mm-sources, instead of mmr-sources). In this case we are con- 
sidering deterministic encryption, and the IND-CDA definition and notions give 
a strengthening (by way of adaptivity) of the PRIV security notion from BEJ. 
(For non-adaptive adversaries the definitions are equivalent.) For clarity we will 
use PRIV to refer to this special case, and let Adv?!’ ,(k) = Adv (2 4(k). 


RESOURCE USAGE. Recall that by our convention, the running time of a CDA 
adversary is the time for the execution of the adversary with game CDA ye, x. 
Thus, A being PT implies that the mmr-sources that comprise A’s LR queries 
are also PT. This is a distinction from [I2] which will be important in our results. 
Note that in practice we do not expect to see sources that are not PT, so our 
definition is not restrictive. Non-PT sources were needed in for showing 
that single-message security implied (non-adaptive) multi-message security for 
deterministic encryption of block sources. 


4 Constructions 


Here we present several constructions for hedged encryption. The first scheme 
uses a random oracle and an IND-CPA secure probabilistic encryption scheme. 
The next two schemes derive from composing a randomized encryption scheme 
with a deterministic one (there are two ways of ordering composition). Interest- 
ingly, only one ordering will end up providing security. The final scheme con- 
verts a deterministic encryption scheme to a hedged one by padding the message 
with random bits. For the following, let AE} = (Pr, Kr, Er, Dr) be a (random- 
ized) PKE scheme with message length n,(-) and randomness length p(-). Let 
AEa = (Pa, Ka, Ea, Da) be a (deterministic) PKE scheme with message length 
na(-) and randomness length always 0. Associate to AE, for c € {d,r} the func- 
tion maxclen.(k) mapping any k to the maximum length (over all possible public 
keys, messages, and if applicable, randomness) of a ciphertext output by Ee. 


RANDOMIZED-ENCRYPT-WITH-HASH. Let R : {0,1}* — {0,1}* be a random 
oracle. Let REwH[AE,] = (P, K,E, D) be the scheme parameterized by random- 
izer length « that works as follows. Parameter generation, and decryption are 
the same as in A€,. Key generation runs K,(par,) to get (pk,,sk,), chooses 
K es {0,1}*), and lets pk = (pk, || K) and sk = sk,. Algorithm EF, on 
input (pk,m) where pk = (pk, || K), chooses r s {0,1}? and computes 
r’ — R(pk, || K || r||m) (where here we take R’s output to be of length p(k)) and 
outputs €,(pk,,m; r’). Intuitively, the random oracle provides perfect and (as 
long as m and r are hard to predict) private randomness. When the key length 
k(k) = 0 for all k, we refer to the scheme as REwH1, while when «(k) > 0 for all 
k we refer to the scheme as REwH2. The scheme extends the Encrypt-with-Hash 
deterministic encryption scheme from [6], which is a special case of REWH1 when 
r has length 0, and is also reminiscent of constructions in the symmetric setting 
that utilize a PRF to ensure good randomness [27983], as well as schemes using 
the Fujisaki-Okamoto transform [Z0]. 
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DETERMINISTIC-THEN-RANDOMIZED. Our first standard model attempt is to 
perform hedged encryption via first applying deterministic encryption and then 
randomized. More formally let DtR[AE,, AEa] = (P,K,E,D) be the scheme that 
works as follows. The parameter generation algorithm P runs par, <sP,(1*) 
and parą —s Pa(1*) and outputs par = (par,, parq). Key generation K just runs 
(pk,, skr) —s K,(par,) and (pkg, ska) —s Ka(parg) and outputs pk = (pk,, pka) 
and sk = (skr, ska). We define encryption by 


E((pk,, pka); m ; r) = E(pk,,c|| 10°; r), 


where c = Ea (pka, m) and £ = n,—|c|—1. Here we need that n, (k) > maxclena(k) 
for all k. Decryption is defined in the natural way. The scheme will clearly inherit 
IND-CPA security from the application of €,. If the deterministic encryption 
scheme is PRIV secure for min-entropy u, then the composition will also be 
secure if the message has min-entropy at least u. However, our strong notion of 
IND-CDA security requires that schemes be secure if the joint distribution on the 
message and randomness has high min-entropy. If the entropy is unfortuitously 
split between both the randomness and the message, then there is no guarantee 
that the composition will be secure. In fact, many choices for instantiating AE, 
and AEq lead to a composition for which attacks can be exhibited (even when 
the schemes are, separately, secure). 


RANDOMIZED-THEN-DETERMINISTIC. We can instead apply randomized encryp- 
tion first, and then apply deterministic encryption. Define RtD[A€,,AEa] = 
(P,K,E€,D) to work as follows. The parameter and key generation algorithms 
are as for scheme DtR. Encryption is defined by 


E((pk,, pka), m; r) = Ea(pka; c || 10°) . 


where c = €,(pk,,m; r) and £ = na — |c| — 1. Here we need that na(k) > 
maxclen,(é) for all k. The decryption algorithm D works in the natural way. As 
we will see, this construction avoids the security issues of the previous, as long 
as the randomized encryption scheme preserves the min-entropy of its inputs. 
(For example, if for all k, all par, € [P,(1*)], and all (pk,,sk,) € [K,(par,)], 
E (pk, -) is injective in (m, r).) Many encryption schemes have this property; El 
Gamal BJ] is one example. 


PAD-THEN-DETERMINISTIC. Our final construction dispenses entirely with the 
need for a dedicated randomized encryption scheme, instead using simple padding 
to directly construct a (randomized) encryption scheme from a deterministic one. 
Let PtD[AEa] = (Pa, Ka, E, D) work as follows. Parameter and key generation are 
inherited form the underlying (deterministic) encryption scheme. Encryption is 
defined by 


E(pkg,m; r) = Ea(pkg,r | m) X 


Decryption proceeds by applying Da, to retrieve r || m, and then returning m. 
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5 Non-adaptive Hedge Security 


In this section we investigate the non-adaptive hedge security of REwH, RtD and 
PtD, leaving adaptive security to future sections. 


RANDOMIZED-ENCRYPT-WITH-HASH. Intuitively, the security of REwH[.AE,] fol- 
lows from the IND-CPA security of AE, and the random oracle providing “per- 
fect” randomness. Following [6], for any k let maxpky¢(k) be the maximum of 
Pr[ pk =w : (pk,sk) s K(par) |, where the maximum is taken over all w € 
{0,1}* and all par € [P(1*)}. 


Theorem 1. [REwH is non-adaptive H-IND secure]. Let AE, = (Pr, Kr, 
E,,D,) be a PKE scheme with message length n(-) and randomness length p and 
let AE = REwH|AE,] = (Pr, Kr, E, Dr) be the PKE scheme constructed from it. 


e (IND-CPA) Let A be an IND-CPA adversary. Then there exists an IND- 
CPA adversary B such that for all k 


Advis g (k) = Advis B (k) 
where B runs in time that of A and makes the same number of queries. 
e (IND-CDA) Let A be an adversary that makes a single LR query consisting 
of a v(-)-vector (u,n, p)-mmr-source with equality-pattern probability ¢(-) 


and making at most h(-) random oracle queries. Then there exists an IND- 
CPA adversary B such that for all k 


ind-cpa 2-h(k 
Advi a(k) < ol) (aava io + SEP) s-maxpkae, (k) ) +CH) 
Adversary B runs in time that of A and maxpkag, is the maximum public 
key probability of AE, 


The first part of the theorem is straightforward to prove. The second follows 
from an adaptation of the proof of security for the similar Encrypt-with-Hash 
deterministic encryption scheme in [6]. Notice that the theorem holds for both 
REwH1 and REwH2; the only difference is that with the latter the maxpk y¢(k) 
term improves depending on the length k. 


RANDOMIZED-THEN-DETERMINISTIC. Intuitively, the non-adaptive hedged se- 
curity of the RtD construction is inherited from the IND-CPA security of the 
underlying randomized scheme AE, and the (non-adaptive) PRIV security of 
the underlying deterministic scheme A€g. As alluded to before, we have one 
technical requirement on AE, for the IND-CDA proof to work. We say AE, = 
(Pr, Kr, Er, Dr) with message length n;,(-) and randomness length p(-) is min- 
entropy preserving if for any k, any par, € [P,(1*)], any (pk,, skr) € [K;(par,)], 
and for all c € {0,1}* it is the case for any (u, nr, p)-mr-source M outputting 
vectors of size one that Pr | c = &,(pk,,m; r) : (m,r) =s M(1*)] < 27+. In 
words, encryption preserves the min-entropy of the input message and random- 
ness. We have the following theorem. 


244 M. Bellare et al. 


Theorem 2. [RtD is non-adaptive H-IND secure]. Let AE, = (Pr, Kr, Er, 

D,) be a min-entropy preserving PKE scheme with message length n,(-) and 

randomness length p(-). Let AEa = (Pa,Ka,Ea,Da) be a (deterministic) en- 

cryption scheme with message length na(-) so that nal) > maxclen,(-). Let 

AE = RtD[AE,, AE a] = (P,K,E,D) be the PKE scheme defined in Section B} 

e (IND-CPA) Let A be an IND-CPA adversary. Then there exists an IND- 
CPA adversary B such that for any k 


Advie dt (k) = Advie p (k) 
where B runs in time that of A plus the time to run Eq once. 
e (IND-CDA) Let A be a CDA adversary that makes one LR query consisting 
of a v(-)-vector (u, nr, p)-mmr-source (resp. block-source). Then there exists 
a PRIV adversary B such that for any k 


Ady 46) < Advee” ik) 


where B runs in time that of A plus the time to run v(k) executions of Er 
and makes one LR query consisting of a v(-)-vector (u, maxclen,.)-mm-source 
(resp. block-source). 


Note that the second part of the theorem states the result for either sources or 
just block-sources. We briefly sketch the proof. The first part of the theorem is im- 
mediate from the IND-CPA security of A€,. For the second part, any mmr-source 
M queried by A is converted into an mm-source M’ to be queried by B. This is 
done by having M’ run M to get (mo, m1, r) and then outputting the pair of vec- 
tors (E,(pk, mo ; r), €;(pk, m1 ; r)). (The ciphertexts are the “messages” for Eq.) 
Because AE, is min-entropy preserving, M’ is a source of the appropriate type. 


PAD-THEN-DETERMINISTIC. The security of the PtD scheme is more difficult to 
establish. The IND-CDA security is inherited immediately from the PRIV secu- 
rity of the AEq scheme. Here the challenge is, in fact, proving IND-CPA security. 
For this we will need a stronger assumption on the underlying deterministic en- 
cryption scheme — that it is a u-LTDF. 


Theorem 3. [PtD is non-adaptive H-IND secure]. Let AE aq = (Pa, Ka, Ea, 
Da) be a deterministic encryption scheme with message length na(-). Let AE = 
PtD[AEa] = (P,K,E,D) be the PKE scheme defined in Section[]] with message 
length n(-) and randomness length p(-) such that n(k) = nalk) — p(k) for all k. 
e (IND-CPA) Let Kı be a universal-inducing (na, ¢)-lossy key generation algo- 
rithm for AEq. Let A be an IND-CPA adversary. Then there exists a LOS 
adversary B such that for all k 


Advts P? (k) < Advi p(k) + V 287 8)-4)42 | 


B runs in time that of A. 


e (IND-CDA) Let A be a CDA adversary that makes one LR query consisting 
of a v(-)-vector (u,n, p)-mmr-source (resp. block-source). Then there exists 
a PRIV adversary B such that for all k 
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Advi a(k) < Adve’ p(k) 


where B runs in time that of A and makes one LR query consisting of a 
v(-)-vector (u, na)-mm-source (resp. block-source). 


One might think that concluding IND-CPA can be based just on PtD being 
IND-CDA secure, since the padded randomness provides high min-entropy. How- 
ever, this approach does not work because an IND-CPA adversary expects knowl- 
edge of the public-key before making any LR. queries, while a CDA adversary 
only learns the public-key after making its LR queries. This issue is discussed 
in more detail in J. We use a different approach (which may be of independent 
interest) to prove this part of Theorem B} the details are given in the full ver- 
sion [Z]. Our proof strategy, intuitively, corresponds to using the standard LHL 
2”(*) times, once for each possible message the IND-CPA adversary might query. 


6 Anonymity for Chosen Distribution Attacks 


In the previous section we proved non-adaptive security for the RtD and PtD con- 
structions. But, as established in Section B] we actually want to meet the stronger 
goal of adaptive security. In the adaptive setting, adversaries can make multiple 
LR queries, specifying sources that are generated as a function of previously-seen 
ciphertexts. Recall that one reason adaptivity is difficult to achieve is because ci- 
phertexts might leak information about the public key. In turn, knowledge of the 
public key leads to trivial IND-CDA attacks. This suggests a natural relationship 
with key privacy, also called anonymity [4]. Anonymity requires (informally) that 
ciphertexts leak no information about the public key used to perform encryp- 
tion. In this section we formalize a notion of anonymity for chosen-distribution 
attacks. In the next section we’ll use this definition as a step towards adaptive 
IND-CDA security. 


DEFINITIONS. Let AE = (P,K,E,D) be an encryption scheme. Game ANON ye 
shown in Figure Blprovides the adversary with two oracles. An ANON adversary A is 
one whose queries are all mr-sources. The advantage of ANON adversary A is 


Adve a(k) = 2- Pr [ ANON4¢ x => true] —1. 


We say that a PKE scheme AE with message length n(-) and randomness length 
p(-) is ANON secure for (u,n, p)-mr-sources if for all PT adversaries A that 
only query (u,n, p)-mr-sources the function Advye"4(-) is negligible. We can 
extend this notion to mr-block-sources in the obvious way. In the special case 
that the randomness length of AE is always zero, the ANON definition formal- 
izes anonymity for deterministic encryption or, equivalently, trapdoor functions, 


generalizing a definition from W. 


Discussion. Anonymity for PKE in the sense of key privacy was first formal- 
ized by Bellare et al. H, but their notion (analogously to traditional semantic 
security) only works in the context of good randomness. The ANON notion, 
akin to IND-CDA, formalizes key privacy in the face of bad randomness. While 
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proc. Initialize(k): proc. Enc(M): proc. LR(M): proc. Finalize(a’): 


par —s P(1*) If pkout = true (m,r)<—s M(1*) Ret (a =a’) 
(pko, sko) —sK(par) Ret L c — E(pka, m; r) 


(pk,, ski) —s K(par) (m,r) =s M(1*) pkout < true 
a +s {0, 1} Ret E(pky,m;r) Ret (pko, pk,,c) 
Ret par 


Fig. 3. Game ANONue,« 


we will use it mainly as a technical tool to simplify showing that schemes meet 
adaptive IND-CDA, it is also of independent interest as a new security target 
for PKE schemes when key privacy is important. (That is, one might want to 
hedge against bad randomness for anonymity as well as message privacy.) 


7 Adaptive Hedge Security 


The following theorem, whose proof appears in the full version [7], shows that 
achieving ANON security and non-adaptive IND-CDA security are sufficient for 
achieving adaptive IND-CDA security. 


Theorem 4. Let AE = (P,K,E,D) be an encryption scheme with message 
length n(-) and randomness length p(-). Let A be a IND-CDA adversary mak- 
ing q(-) LR queries, each being a v(-)-vector (u,n, p)-mmr-source (resp. block- 
source). Then there exist IND-CDA adversary B and ANON adversary C such 
that for all k 


Adve a(k) < 2q(k) - Adve g(k) + 4q(k) : Advige'o(k) . 


B makes one LR query consisting of av(-)-vector (u, n, p)-mmr-source (resp. block- 
source). C makes at most q(k) — 1 Enc queries and one LR query, all these con- 
sisting of u(-)-vector (u,n, p)-mr-sources (resp. block-sources). Both B and C run 
in the same time as A. 


Given a non-adaptively IND-CDA secure scheme, Theorem] reduces the task of 
showing it adaptively secure to that of showing it meets the ANON definition. 
Of course, ANON is still an adaptive notion. (Adversaries can formulate their 
LR query to be a source that’s a function of previously seen ciphertexts.) Nev- 
ertheless, it formalizes a sufficient condition for adaptive CDA security of any 
PKE scheme and captures the relationship between adaptivity and anonymity. 
We believe this is an interesting (and novel) application of anonymity. 

We can show that our random oracle scheme REwH is ANON secure when the 
underlying randomized scheme meets the traditional notions of anonymity for 
PKE Ø. We also want to show that the RtD and PtD schemes are ANON secure. 
We first show something more general: that any u-LTDF is anonymous. Then, 
that RtD and PtD are anonymous follows when using deterministic schemes that 
are also u-LTDFs. 
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UNIVERSAL LTDFS ARE ANONYMOUS. Intuitively u-LTDFs are anonymous be- 
cause the lossy mode admits a universal hash, implying that no information 
about the public key is leaked by outputs (generated from sources with high con- 
ditional min-entropy). One might expect that formalizing this intuition would 
follow from straightforward application of the Leftover Hash Lemma (LHL) [26]. 
However our anonymity definitions are adaptive, so one cannot apply the LHL 
(or even the generalized LHL [[%]) directly. Rather, we first show an adaptive 
variant of the LHL is implied by the standard LHL via a hybrid argument. See 
the full version for details. Here we use it to prove the following theorem; details 
appear in the full version [A]. 


Theorem 5. Let AE a = (Pa, Ka, Ea, Da) be a (deterministic) encryption scheme 
with message length n(-) and an associated universal-inducing (n, £)-lossy key gen- 
erator Kı. Let A be an ANON adversary making q(-) Enc queries and a single LR 
query, all of these being v(-)-vector (u, n)-m-block-sources. Then there exists LOS 
adversary B such that for all k 


Adve) a(k) < 2. Advts. 5(k) +3-q(k)-u(k)-V 208) &k)— atk) 


B runs in time that of A. 


Consider RtD and PtD when instantiated with a deterministic encryption scheme 
that is a u-LTDF. We can apply Theorem D] to conclude ANON security for 
both schemes. Combining this with Theorems P] and Hl yields proof of adaptive 
hedge security for RtD. Likewise, combining it with Theorems B] and J yields 
proof of adaptive hedge security for PtD. Also Theorems] and B] combine with 
Th. 5.1] to give the first adaptively-secure deterministic encryption scheme 
(based on u-LTDFs). 


REwH2 IS ADAPTIVELY SECURE. As we show above, we can get adaptive security 
from REwH when the underlying IND-CPA randomized scheme is anonymous 
in the sense of M]. We observe that scheme REwH2 is adaptively secure when 
instantiated with any IND-CPA randomized scheme (not just anonymous ones). 
To show this, we give a direct proof in the full version [/]. Since popular encryp- 
tion schemes such as RSA are not anonymous, we believe scheme REwH2 could 
be relevant in practice. That being said, we still think REwH1 is important since 
non-adaptive security is still a strong notion, and the scheme does not require 
any changes to the structure of the public key. 


EXTENSIONS. In the full version [f] we discuss extensions and variants of RtD 
and PtD, where we improve the (adaptive) concrete security and show how to 
securely use LTDFs that are not necessarily universal. 
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Abstract. Secure multi-party computation has been considered by the 
cryptographic community for a number of years. Until recently it has 
been a purely theoretical area, with few implementations with which to 
test various ideas. This has led to a number of optimisations being pro- 
posed which are quite restricted in their application. In this paper we 
describe an implementation of the two-party case, using Yao’s garbled 
circuits, and present various algorithmic protocol improvements. These 
optimisations are analysed both theoretically and empirically, using ex- 
periments of various adversarial situations. Our experimental data is 
provided for reasonably large circuits, including one which performs an 
AES encryption, a problem which we discuss in the context of various 
possible applications. 


1 Introduction 


That secure multi-party computation can be executed at all is considered one 
of the main results of the theory of cryptography. Starting with Yao’s seminal 
work Bd many authors have looked at various optimisations and extensions to 
the basic concept, for both the two-party and the multi-party settings, see for 
example i mi ki Lg, pd, E! Bg. Until recently all work on secure multi-party 
computation has been essentially of a theoretical nature, focusing on feasibility 
results. However in the last few years a number of practical implementations 
have appeared B, p, B, pd pd. 

There are many different protocols for secure multi-party computation. Our 
work focuses on implementation of secure computation and therefore we only 
mention protocols which have been previously implemented. Secure multi-party 
computation essentially comes in two flavours. The first approach is typically 
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based upon secret sharing and operates on an arithmetic circuit representation 
of the computed function, such as in the BGW (Ben-Or, Goldwasser and Wigder- 
son) or CCD (Chaum, Crepeau and Damgard) protocols HB. This approach is 
usually applied when there is an honest majority among the participants (which 
can only exist if more than two parties participate in the protocol). An alter- 
native approach represents the function as a binary circuit. This approach was 
used in the original two-party garbled circuit construction of Yao at and in 
the GMW (Goldreich, Micali and Wigderson) multi-party protocol 

The arithmetic circuit method is better at representing addition and multipli- 
cation operations, where parties have additive shares of secret values, but cannot 
be used to compute comparisons unless the shares are converted to shares of the 
binary representation of the values. This approach has been used to great effect 
in the SIMAP project fal, which has resulted in a “real-life” application of secure 
multi-party computation to the Danish sugar beet industry [5]. 

The binary circuit approach handles arithmetic operations, especially mul- 
tiplications, less efficiently, but can easily compute binary operations such as 
comparisons. This second approach, which forms the basis of Yao’s construction 
for the two party case, has been implemented by Malkhi et al. in the Fairplay 
system RA. That system also provides a method to compile a given functionality 
from a representation in a high-level language into a circuit, which is then in- 
terpreted by a run-time environment that performs the secure evaluation of this 
functionality. FairplayMP, an extension of Fairplay to the case of more than two 
parties using a modified version of the protocol of Beaver et al. B has recently 
been released B. All these implementations provide security against semi-honest 
adversaries only. A major advantage of the binary circuit based systems (Fair- 
play and FairplayMP) is that they run in a constant number of communication 
rounds, whereas the SIMAP system has the advantage of being able to process 
arithmetic operations very efficiently. 

Efficient extensions of Yao’s construction to more relevant adversarial models 
have been a topic of research interest in the last few years. There are several 
constructions which aim to secure the protocol against malicious adversaries 
without using generic zero-knowledge protocols. We will focus on the construc- 
tion of Lindell and Pinkas which is efficient and provides fully simulatable 
security according to the definition of Canetti mE. A definition of a weaker class 
of corruption, “covert adversaries”, and a protocol secure against this type of 
behavior, was provided by Aumann and Lindell fi}. In an implementation 
of the basic Lindell—Pinkas protocol was reported upon and experimental data 
in various security models was provided. 


1 This construction may be preferable over other two-party protocols with secu- 
rity against malicious adversaries. The construction of Mohassel and Franklin 
only protects privacy and is not fully simulatable. The construction of Jarecki and 
Shmatikov fia) requires the use of public-key operations, rather than symmetric key 
operations, for any gate of the circuit. The construction of Nielsen and Orlandi [26], 
too, uses public key operations, or rather public-key based commitments, for each 
key of every wire of the circuit. A precise practical comparison between the different 
approaches is beyond the scope of the current paper. 
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In this paper we improve on the implementation of BA in a number of ways. 
The resulting set of quantitative improvements results in qualitative conclu- 
sions: (1) We demonstrate that two-party computation, secure against malicious 
adversaries, is truly practical, and we experimentally identify the performance 
bottlenecks which remain after our optimisations. This result should direct fur- 
ther research to the issues which have the largest effect on performance. (2) We 
experiment with a secure computation of the AES standard, and show that it 
is indeed feasible, even with security against malicious adversaries. There are a 
number of applications of such an implementation, some of which we describe be- 
low. (3) We provide the first implementation of a protocol with security against 
covert adversaries and we compare the performance of all 3 types of protocols: 
malicious, covert and semi-honest. 

A more detailed summary of our main results is as follows: 


— We improve the communication cost for transmitting the circuits between the 
parties. In the case when we model the underlying key derivation functions 
KDFs) as correlation robust (see discussion below), using the technique of 
td we are able to transmit no information for the XOR gates within the 
circuit. In this situation we are also able to reduce the data which needs to 
be sent by 25% for the other gates. When we are not willing to model the 
KDFs as correlation robust, and we only assume they are psuedo-random 
functions, we are unable to perform the free XOR optimisation. However 
we are able to reduce the communication cost for all gates by 50%. Unlike 
other methods used to improve communication, like (ca). our improvement 
makes a marginal impact on computational costs. We will return to this in 
a later section. 

— In addition to the theoretical analysis we provide experimental data for eval- 
uating “real life” circuits, in both the honest-but-curious, covert and mali- 
cious adversary cases; also for the two different methods in the literature that 
construct the auxillary circuits in the covert and malicious cases (see BA and 
the full version). The implementation for the malicious setting is based on 
the construction of Lindell and Pinkas which provides security in the 
sense of full simulatability. Therefore the resulting construction can be used 
as a black-box primitive in more complex applications. The use of our opti- 
misations results in a considerable performance boost compared to previous 
experimental results published in bd 

Our optimisations change the performance bottleneck to a different part 
of the computation; namely, the verification of garbled circuits generated 
by the circuit constructor. This observation is important for focusing future 
research on the issues that affect the overhead the most. 

— We experiment with secure evaluation of a circuit which computes an AES 
encryption of a single block. The secure computation of AES involves one 
party which knows the key, and a different party which has an input block. 
The second party learns the encryption of the block, while the first party 
learns nothing. We demonstrate the feasibility of computing this function in 
the semi-honest, covert and malicious settings. 
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Secure evaluation of AES has an impact in a number of scenarios which we will 
discuss in short here and elaborate on in the full version. The fact that a secure 
computation of AES is feasible, and can run in a matter of seconds, is quite 
surprising. 


Application 1, OPRF: A secure computation of a pseudo-random function, 
denoted OPRF for “oblivious prf”, has been defined in [d for the purpose of 
secure keyword based searches, and was subsequently used in different applica- 
tions. The OPRF protocol in ig) is based on the Naor-Reingold prf, which is a 
number theoretic construction. Our construction has different advantages over 
the NR based construction, which we detail in the full version. 


Application 2, Side Channel Protection: In LA the authors introduce 
“one-time programs”, which are programs that can only be executed once and 
then “self-destruct”. An important advantage of this construction is that the 
execution of the program reveals no side-channel information. Most of the com- 
putation in that construction is essentially done using a garbled Yao circuit. 
One of the main applications of smart cards is to compute symmetric encryp- 
tions, and therefore the ability to compute AES encryptions by Yao circuits has 
immediate application in the above scenario. It enables smart cards to perform 
a one-time computation, secure against side-channel attacks, of AES. This is 
particularly interesting since in that setting the circuit evaluation need only be 
secure against semi-honest adversaries, while we show below that semi-honest 
computation of AES can be run very efficiently, taking only a few seconds. 


Application 3, Blind MACs and Blind Encryption: One can think of the 
operation of obtaining the AES encryption of a message, under the other party’s 
secret key, as a blind MAC or a blind symmetric encryption. These operations 
have different applications in secure computation. 


Application 4, Third Party Operations on Encrypted Data: We essen- 
tially show that encryption and decryption can be implemented using circuits. 
This enables secure computation of homomorphic operations on encrypted data. 
This operation is done by a circuit which receives two ciphertexts from one party 
and a key from the other party, decrypts the ciphertexts, applies some arbitrary 
mathematical operation to the plaintexts, and then encrypts the result. 


2 Yao’s Garbled Circuit Construction 


Two-party secure function evaluation makes use of the famous garbled circuit 
construction of Yao Bd which we briefly overview in this section. The basic 
idea is to encode the function to be computed via a binary circuit and then to 
securely evaluate the circuit on the players’ inputs. 
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2.1 Garbled Circuits 


We consider two parties, denoted as P; and P2, who wish to compute a function 
securely which is represented as a simple binary circuit. First assume the circuit 
consists of only a single gate with two input wires and one output wire. We 
denote the input wires by wı and w2, and the output wire by w3. The input to 
wı is denoted by bı and is known to P4, similarly P> knows the input to wə and 
this is given by b2. Each gate has a unique identifier Gid; this enables a circuit 
fan out of greater than one, i.e., it enables the output wire of one gate to be used 
in more than one other gate. We require that P> evaluates the gate on the two 
inputs, without P, learning anything, and without P> determining the value bı, 
bar what it can deduce from the output of the gate and its own input. We define 
the output of the gate by the function G(b, b2) € {0,1}. 

The construction of Yao works as follows. P; encodes, or garbles, each wire 
wi by selecting two different cryptographic keys k? and kj of length t. Here t is 
a computational security parameter which suffices for the length of a symmetric 
encryption scheme. A random permutation 7; of {0,1} is associated to each wire. 
The garbled value of wire w; is then represented by k? ci, where c; = m;(b;). 
We call the value c; the “external value” of the wire, note that this value is 
completely independent of the actual value of the wire b;. 

An encryption function Ep, ;,,(m) is selected which has as input two keys 
of length t, a message m, and some additional information s. The additional 
information s must be unique per invocation of the encryption function, i.e., it 
is used only once for any choice of keys. The gate itself is then replaced by a 
four entry table indexed by the values of cı and c2, and given by 


. pGid||c1||c2 G(b1,b2 
1,02: E llc1 || (Gs lcs), 


by b2 
ki! ke 


where cı = 71(b1), C2 = 72(b2), and c3 = 73(G(b1, b2)). Each entry in the table 
corresponds to a combination of the values of the input wires and contains the 
encryption of the corresponding garbled output value. The resulting look up 
table, or set of look up tables in general, is called the “garbled circuit”. 

Player Pı then sends to P the garbled circuit, the key corresponding to its 
input value pA. the value cı = 71(b1), and the permutation 73. The parties 
engage in an oblivious transfer (OT) protocol so that P> learns the value of 


k??||c2, where c2 = m2(b2). Player P> can then decrypt the entry in the look up 


table indexed by (c1, c2) using k?! and k$?; revealing the value of por) ||c3. Po 


determines the value of G(b1, b2) by using the mapping 73 from c3 to {0,1}. 
In the general case the circuit consists of multiple gates. Player Pı chooses 
random garbled values for all wires and uses them for constructing tables for 
all gates. It sends these tables, i.e., the garbled circuit, to P> and in addition 
provides P> with the garbled values and the c values of P,’s inputs, and with the 
permutations m used to encode the output wires of the circuit. Player P uses 
invocations of oblivious transfer to learn the garbled values and c values of its 
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own inputs to the circuit. Given these values, P> can evaluate the gates in the 
first level of the circuit, compute the garbled values and the c values of their 
output wires. Player P> can then continue with this process and compute the 
garbled values of all wires in the circuit. Finally P> uses the 7 permutations of 
the output wires of the circuit to compute the real output values of the circuit. 
If P; additionally requires some output from the circuit then this can be dealt 
with by standard mechanisms, as described in the full version. 

One could use more general gates than 2-to-1 gates, such as n-to-m gates 
with 2” entries. However the optimisations we shall present in this paper are 
most effective when applied to 2-to-1 gates. While we found that more general 
gates can improve the performance of a naive Yao circuit protocol, they actually 
decrease the performance of the optimisations. Hence the rest of this paper is 
restricted to 2-to-1 gates. 


2.2 Required Implementation Details 


Having described the basic theoretical description of Yao’s protocol and its ex- 
tensions, we now present a number of implementation details which are needed 
to understand some of our optimisations. The basic implementation choice of 
the underlying encryption scheme to be used is the same as the implementation 
described in bd 


Oblivious transfer: Unlike 2) we do not use the OT scheme of Hazay and 
Lindell (HL) iG Instead we use the OT scheme of Peikert et al. (PVW) ei. 
This scheme is UC-secure and hence requires the setup of a Common Reference 
String (CRS) of a few hundred bits. For our experiments we assume that this is 
given to the parties. (Alternatively, the parties can run a coin-tossing protocol 
to generate the CRS, which is possible due to the nature of the CRS used in the 
PVW scheme.) The batched method of PVW is more efficient per OT than the 
batched method of HL, especially on the receiver’s side. In particular the CRS 
can be used for any number of invocations of the OT, whereas the method in HL 
requires the maximum number of OT’s being executed to be known before the 
setup is performed. (The setup in HL also requires two ZK-proofs as opposed to a 
CRS being created in PVW.) The OT stage is not our computational bottleneck, 
and is unlikely to be, unless one is in the rare situation of having a circuit with 
a large number of inputs for Pə and yet a relatively small number of gates. 
Thus we do not consider optimisations of OT schemes which are secure against 
only semi-honest or covert adversaries, since the fully secure OT is efficient 
enough. 


Encryption scheme: The only implementation detail we will need from 
is that the encryption scheme is implemented via 


Ets (m) =m KDF”! (kı, k2, s) 
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where KDF is a key deriviation function, whose |m| bits of output are indepen- 
dent of the two input keys in isolation, and which depends on the value of s. We 
will instantiate this function as followd 


KDF“ (kı, ko, 8) = H(kil|s)1... ®@ H(kalls)1..0. 


Even if H is a Merkle-Damgard type hash function this will be secure (with the 
associated issues of length extension), since we are only applying the function to 
fixed length inputs. Indeed, in our experiments we implement H using SHA-256. 


Modeling the hash function, and correlation robustness: In this paper 
we need to model the underlying hash function H in two ways. In the first we 
make the usual assumption that it behaves as a pseudo-random function, namely 
that H(k||s) is an invocation of a pseudo-random function keyed by k, with the 
input s. However one of our optimisations requires that we make a stronger 
assumption on the hash function, namely that it is correlation robust. This later 
property can be stated formaly as follows: 


Definition 1 (Correlation robustness (14). An efficiently computable func- 
tion H : {0,1}* — {0,1}* is correlation robust if the following distribution is 
pseudo-random: (t1,...,tm,H(ti @r),...,H(tm ®r)), where ti,...,tm and r 
are chosen at random, and m is polynomial in the security parameter. 


This can also be stated by saying that the function f,(~) = H(z @r) isa 
weak pseudo-random function. The definition also implies that the distribution 
of (H(t1),...,H(tm), H(t1 ®r),...,H(tm ®1r)) is pseudo-random. 

The correlation-robustness assumption is satisfied by a random oracle (or 
rather by a very weak form of it: a non-programmable, non-extractable ran- 
dom oracle). However, assuming correlation robustness seems as a much weaker 
requirement than assuming the existance of random oracles. This assumption 
has been introduced in and was used there for providing security against 
malicious adversaries for a method of extending oblivious transfer. The correla- 
tion robustness assumption has been recently used in the context of oblivious 
transfer fia, and in the context of secure computation Eg, Bg. 

For our construction, as we deal with circuits with arbitrary fan out, we re- 
quire a slightly modified definition. Namely that for any set S = {s1,...,8)5)} 


2 In Rå two instantiations were presented, depending on whether we are working in 
the random oracle model (ROM) or standard model, via truncating, or extending, 
the output of a suitable hash function H in the standard way as follows 


H(kıl||kə||s)1...e H is modeled as an RO, 


£ = 
KDF (kı, k2, s) = nee ® H(kolls)1..c H is modeled as a PRF. 


The difference is that the security analysis in the ROM works even if we feed related 
keys to different invocations of the function. Namely, it is possible to compute, say, 
H(kıl|k2), H(ki\|k5), H (kil|k2) and H (ki ||kż) and claim that knowledge of ki, k2 does 
not disclose information about any of the values except H(ki||k2). This is impossible 
in the standard model. Therefore if H() is modeled as a prf it must be invoked 
separately with each key. 
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of size which is of the same order as the number gates, the distribution of 
(tr-o, tm, (H (tr © r)lls1), -- -, (tm © P)ll81))), (H(t @ 7)|I82),--., H((tm © 
r)||82))),---,(H((tr © r)lisisi), ---, H ((tm © r)l||sısı))) is pseudo-random, where 
ti... tm and r are chosen at random. In other words, all the pads that are 
used for encrypting table entries are pseudo-random. If one is willing to assume 
this then our optimisations provide highly efficient protocols. We also provide 
optimisations for when the user is unwilling to make such an assumption. 


3 Structural Optimisations of the Circuit 


Yao’s protocol operates on functions which are described as a boolean circuit, and 
its overhead depends on the size of the circuit. A convenient way of generating 
a representation of a function in this form is to use a compiler which translates 
a description of a function in a high-level language to a description as a binary 
circuit. The Fairplay system provides a compiler for this task which operates on 
functions described in a high-level language called Secure Function Description 
Language (SFDL) B, Rd. We use that compiler as the basis of our experiments, 
but use our own run-time environment to execute the protocol. 

There are a number of general circuit simplifications which can be performed 
to the output of the Fairplay compiler. We have implemented a number of these, 
based on two basic ideas: (1) identifying component circuits which can be re- 
placed by simpler combinations of gates, and (2) identifying complicated compo- 
nents whose output must always be zero, or one; this allows for the component to 
be removed and other subsequent components to be further simplified. A com- 
bination of these techniques is surprisingly effective, and allows us to produce 
circuits which are often 60 percent more efficient than the circuit produced by 
the Fairplay compiler. 

Many of the techniques used are ad-hoc, but the following technique is partic- 
ularly effective. First, by a technique akin to common sub-expression elimination, 
we identify sets of gates which can be replaced by a single 3-to-1 gate, and then 
replace the 3-to-1 gate with a set of 2-to-1 gates which was chosen to minimize 
the number of non-XOR gates. This is particularly effective when combined with 
our later technique of Section KĮ in the case of correlation robust KDFs, to re- 
move the cost of any XOR gates; however the technique is also successful in 
the more general case as well. We call a gate even if its truth table has an even 
number of ‘1’ entries (for example, a XOR. gate is even), otherwise it is called 
odd (an OR gate, for example, is odd). We show in the full version that it is 
possible to replace any 3-to-1 even gate with at most a single 2-to-1 non-XOR 
gate and at most three XOR gates. The optimal transformation rules, which we 
found by exhaustive search, are listed in the full version. 


4 Optimisations with Free XORs, When the KDF Is 
Correlation Robust 


In Kolesnikov and Schneider present an optimisation based on the correlation 
robustness assumption, which allows XOR gates to be evaluated for free, thus 
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doing away with the need to evaluate or transmit the garbled tables for such 
gates. The optimisation requires that there is a global random value R of bit 
length t, known only to Pı, such that for all garbled wires w; it holds that 
kl = k? @ R. In other words, the garbling of the 1 value of a wire, is determined 
purely from XOR-ing the garbled 0 value with the value R. Note that a similar 
property holds for the external values of the wire: 7;(1) = 7;(0) @ 1. With this 
convention we have that a XOR gate can be implemented by simply XOR-ing 
together the two garbled input values, and the two external values. Namely, for 
a XOR gate mapping wires wı and we to wire ws, it holds that k3 = kı ® k2 and 
c3 = cı @ c2. For a full proof of this optimisation see fg}. Note that [19] states 
the proof in the random oracle model, but it can be easily seen, as noted in i 
that the proof can be based on the correlation robustness assumption. 


Garbled Row Reduction — GRR: The above solution is ideal for XOR gates, 
but in addition we would like to reduce the size of the tables of the non-XOR 
gates as well. The following simple optimisation (which was pointed out in R3) 
provides a 25 percent reduction in the sizes of the tables needed to represent 
two-input gates. We can do this in a way which still allows the use of the above 
trick for free XOR gates. (In general, this method provides a 1/2” reduction 
in the size of n-to-1 gates, but we will only describe it in detail for the two 
input case.) 

The observation is that instead of defining the two garbled values of the output 
wires randomly, we can define one of them as a function of garbled values of the 
two input wires which result in this output value. In other words, we choose an 
input pair (b1, b2) € {0,1}?, and define the garbled output value of G(b1, b2) to 
be a function of the garbled values of bı and bz. The gate table therefore need 
not store an entry for the input combination (b1, b2). In the evaluation phase, 
if the evaluator has the garbled values of the pair (b1,b2) it can compute the 
corresponding garbled output directly, without consulting the gate table. 

Suppose the gate maps wire wı and wire wz to wire w3. As before we let 
k? and k} denote the garbled wire values, G(b1,b2) denote the function being 
implemented by the gate, and we set the external value of the wire to be c; = 
m(b;). We then define the garbled output value corresponding to the output 
resulting from the external input values (co, c1) = (0,0) as 


=j —1 mF =a 
Kors (0),75 Oles — KDF*t! Ge (o). ke? (0) Gid|/0|/0) A 


In other words, the garbled value is exactly equal to the pseudo-random mask 
that was used to hide it in the basic protocol. Note that this operation also 
defines the external value c3 of this output value. We therefore define m3 such 
that c3 = 13(G(m,_'(0), 72 (0))). The other garbled value of the output wire, 


1—G(m_*(0),751(0)) « ; 
k3 1 2 is then chosen as in the free XOR method above, to enable the 


evaluation of XOR gates for free. The table is then constructed in the standard 
way except that we do not store, or transmit, its first entry. 

On evaluating the garbled gate the evaluator proceeds as in the standard 
algorithm except when it wishes to access the first entry of the table, i.e., when 
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the external values of both input wires are 0, namely cy = c2 = 0. In that 
case it possesses the garbled values k?! and k?, where bı = 7, +(0) and by = 
r3 (0). It uses them to compute pgb) and c3 = 73(G(b1, b2)), by computing 
KDF**! (x? , k32, 0|[0|| Gia) as defined in the equation above. 


We will denote this optimisation as Garbled Row Reduction, GRR for short, 
in our future discussions. 


Security: We sketch why the above optimisation maintains security. Recall 
that the proof of security for Yao’s protocol given in shows security against 
a corrupt P> based on a hybrid argument, and on a claim that for each gate it is 
infeasible to distinguish between a correct garbled table of this gate and a table 
which encrypts the same value in all four entries. In order for this argument to 
apply to the GRR optimisation, it is required to show that it is infeasible to find 
out if the garbled value assigned to the first table entry, p (0), le is 
equal to the values encrypted in the other entries. However this value is equal to 
the mask that is used to encrypt the first entry in Yao’s original protocol, and 
we know that if a polynomial adversary is given only a single pair of garbled 
input values then the masks that are used for encrypting the other entries of the 
table are pseudo-random. Therefore the claim follows. 


5 Optimisations without Free Xors, When the KDF Is 
Not Correlation Robust 


One may not want to assume the KDF is correlation robust, or perhaps the 
proportion of XOR gates in the circuit is so low that making this assumption is 
not as effective. In these situations, too, we would like to reduce the overhead 
required by the Yao circuit. This section describes an optimisation which reduces 
the size of every two-input gate by 50%, but which, unfortunately, cannot be 
combined with the free XOR method of Section J 

The underlying idea is that if we are not using the free XOR trick then the two 
values of the output wire can be chosen independently§y The 50% reduction in 
the size of the gate tables is based on Shamir secret sharing Bä. It makes use of 
a finite field Fə. Recall that t is the bit length of the keys used to represent the 
garbled values of the wires. We can therefore interpret keys as elements of Fə: 
and vice versa. We also interpret small integers such as 1, 2, 3 etc. as elements 
in Fæ. For example if we think of Fə as F2[X]/(f(X)), for some polynomial of 
degree t, then the integer 3 can be interpreted as x + 1. 

As before we assume a garbled table indexed by the external values, cı and c2, 
and each entry corresponds to the value being output, on input of the values k” 
and k$? where b; = a7 '(c;). We set the rows of the gate table to be numbered 


3 This allows for possible extensions of the GRR method, and in the full version we 
detail another optimisation method, which we call Garbled Table Reduction (GTR), 
which reduces the size for the garbled tables needed to represent odd 2-to-1 gates 
by 1/3, and the size of tables of even 2-to-1 gates by 1/2. 
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1,...,4, and therefore set r = 2c, + c2 + 1 to be the row number of table entry 
(C1, C2). We define the value used to mask this entry as 


K,||M, = KDF**"(kq", k3’, 8) (1) 


where s = Gid||ci||c2, K, is a bit string of length t bits and M, is a single bit 
used to mask the external value of the output. We use a different method for 
optimising odd and even gates. The truth table of each gate, and therefore also 
the information whether the gate is odd or even, is known to the circuit evaluator. 
Therefore it can compute each gate according to the right method. (The only 
information hidden from the evaluator is the values passing on intermediate wires 
of the circuit.) 


5.1 Odd 2-to-1 Gates 


Suppose we are implementing an OR-gate, where the external values of cı = 0 
and c2 = 0 correspond to the real input values (0,0), the other cases will follow 
immediately from the following. This means that the values r = 2,3 and 4 
should evaluate to the same output value k3, whilst r = 1 should evaluate to the 
output value k9. We first define over Fy: a polynomial P(X) of degree two, by 
interpolating the polynomial which intersects the three points (2, K2), (3, K3) 
and (4, K4), where each K, value was defined according to equation (I). (This 
is the value which in the other constructions was used to mask entry r of the 
table.) The garbled output value k3 is defined to be kł = P(0). We also compute 
Ks = P(5) and Kg = P(6). We then define a second polynomial Q(X), also of 
degree two, by interpolating the polynomial which intersects the three points 
(1, Kı), (5, K5) and (6, Ke), where Kı was defined according to equation (J). 
The garbled output value k} is now defined by k} = Q(0). The garbled table is 
replaced by the two values (K5, Ke). In addition, for each of the four original 
rows, the external value for the output wire in the rth row is encrypted using 
the bit M,, defined in equation (J). The total amount of data sent for the gate 
is therefore 2t + 4 bits. 

Player P> then, given two key values kes and kee plus two external values cı 
and c2, computes, using equation (I) the value of K, and M, for r = 2c, +c2+1. 
Recall that the evaluator knows r but not bı or bg. It then uses the two supplied 
values of Ks and Kg to interpolate the polynomial passing through the points 
(r, K;), (5, Ks) and (6, Ke). The result is either Q(X) or P(X), depending on 
whether r = 1 or not. Player P> then recovers the associated secret value ae by 
evaluating the polynomial at the point X = 0. Using M, the evaluator can also 
decrypt the encryption of the external value of the output wire and so obtains 
c3. Hence the evaluator recovers the correct value of the output wire. 


5.2 Even 2-to-1 Gates 


The only non-trivial even 2-to-1 gates are the XOR and NXOR gate, since all 
other gates can be replaced by wires. Again let us assume the external input 
values cı = 0 and c2 = 0 correspond to the real input values (0,0), and assume 
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we are dealing with a XOR gate. Then the entries 1 and 4 in the standard garbled 


al 
table will correspond to the same output key, namely ke? (0) Any other case 
will follow from the following description. 


Player P; first creates a linear polynomial P(X) over Fə which interpolates 


-1 
the two points (1, Kı) and (4, K4). The value of k3’ © is defined to be equal 
to P(0). If the external value of this output value is 0 then we store P(5) into 
the first row of the new table of this gate, otherwise we store P(5) as the second 
entry. Then P, creates another linear polynomial Q(X) which interpolates the 


two points (2, K2) and (3, K3). The value of ka? '® is then defined to be Q(0), 
and the value Q(5) is stored in the remaining row of our new table. The external 
values of the output wires are now encrypted and stored, using the M, values 
as before as a seperate sub-table of 4 bits in length. Thus, the total amount of 
data required to represent the gate is 2t + 4 bits. 

Player P> given two key values k?' and k$? plus two external values cı and 
C2, computes the value of K, and M,.. Using M, it can determine the external 
value of the output wire. If this external value is zero then using the first entry 
of our garbled table and the value of K,, the evaluator recovers P(X) and hence 


=i 

P(0) =k," (°) TF the external value is one then using the second entry of the 
=ü 

table and the value K,, the evaluator recovers Q(X) and hence Q(0) = k3° D, 
Security: We sketch why the above optimisations maintain security. Given a 
pair of garbled values of the input wires, P> can compute a garbled output 
value, but cannot distinguish the other garbled output value from random. This 
is because that other garbled value is defined using a linear combination with a 
value which is unknown to P». This fact can be used in a, somewhat modified, 
security proof in the spirit of the proof of Yao’s protocol in 


6 Some Experimental Results 


We now present some experimental results. In our results we separate out pre- 
computation time, i.e., generating the required garbled circuits, from the rest 
of the computation. This is because it depends on the application whether one 
should consider this time as part of the computation time or not. 

There are two major conclusions of our experiments. Firstly, assuming the 
KDF is correlation robust then the GRR optimisation produces the most ef- 
ficient implementation. Secondly we conclude that rather large circuits can be 
practically evaluated using the methods described. Thus secure two-party com- 
putation has become more of a reality than one might previously have thought. 


Example 1 — Evaluation a Simple Circuit: First we present results for a 
simple circuit, where we took the circuit for which each of P) and P ’s input is 
a 32-bit integer. The output for P> should be the single bit resulting from the 
application of the comparison operator on the inputs. The output for P; will be 
a six bit integer resulting from the scalar product of the bits of the two inputs, 
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Table 1. Experimental Results For Example 1 (Times are in seconds) 


ab % XOR|Precomp Send OT Calc |Total| Total 
Method oe Gatos Time = Time = Time|K Bytes 


HH 
Covert| Indep. 
Inputs 
Covert |Random 
Comb. 
Malic. | Indep. S 180599 
Inputs 173942 
CoR-GRR 164323 
ROM-GRR 161741 
Malic. |Random S ) 167276 
Comb. | PRF-SS ) 158904 
CoR-GRR 140265 
ROM-GRR 137609 


i.e. the number of ones in the string obtained from forming the bit-wise “and” 
of the two strings. 

Applying the Fairplay compiler to this functionality we obtain a circuit with 
689 gates. We produce two circuits from this output; the first, denoted C,3, is 
to allow comparison with the existing state of the art, namely the methods of 
BJ. This is a circuit which uses 2-to-1 and 3-to-1 gates and has 245 gates. The 
second circuit we use, denoted Cxo;, replaces, via the techniques of Section B} all 
complex gates with 2-to-1 gates, and tries to minimise the number of non-XOR 
gates in the circuit. This circut has 531 gates, 240 of which are non-XOR gates. 
An extra six gates are needed in each circuit so as to encode P,’s for tranmission 
back to Pı, without P> learning the value. 

The above circuit sizes are purely to implement the functionality, they do 
not include the extra wires and gates required to transmit P,’s output back 
to P, (for details of how this is done see the full version), nor do they include 
the extension of the circuit to cope with P,’s input in the case of Covert and 
Malicious adversaries. (We refer to the two methods for encoding P ’s input as 
the independent inputs and the random combinations methods. For the details of 
these methods see bd or the full version. These methods add a set of XOR gates 
to the circuit, which transform P2’s inputs using a random linear encoding.) The 
sizes of the extended circuits, and the resulting run-times are given in Table [J] 
which measures the total elapsed wall times in seconds for the various cases. 
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The calculations were performed on two machines with Intel Core 2 Duo’s 
running at 3.0 GHz, with 4GB of RAM connected by a 1GB ethernet. The hash 
function H() used in the protocol was implemented as SHA-256. 

The column of “Total KBytes” contains the total number of kilobytes of data 
which were transferred during the run of the protocol. The column “Method” 
details the type of computation used, as follows: 


— Base: Denotes the optimisations proposed in BJ, extended to the case of 
Covert and Honest adversaries, which we use for comparison purposes, as 
our baseline implementation. This uses the C2,3 circuit mentioned above, 
the KDF which is secure in the standard model, and the OT of Hazay- 
Lindell as opposed to that of Peikert et al. 7. 

— PRF-SS: This denotes using the secret sharing based method of Section 
to reduce the size of the garbled tables. For this the KDF is assumed to be 
a PRF, but not correlation robust. 

— CoR-GRR: This denotes an implementation which is only secure assuming 
the KDF is correlation robust. It uses the free XOR trick and the method of 
Garbled Row Reduction, from Section] to reduce the size of the remaining 
garbled tables. 

— ROM-GRR: As above for CoR-GRR but all hash functions used are modelled 
as random oracles. This means we can implement our KDF via a single hash 
function call, based on the method described in Footnote B} 


The column denoted “No. of gates” describes the number of gates, and the 
percentage of XOR gates, in the extended circuit (which transfers P;’s outputs 
and applies the extension described in the full version, encoding P ’s input). 

For the Covert and Malicious cases the “Input Enc.” column denotes whether 
we use the Independent Inputs technique or the Random Combinations technique 
for the extended circuit construction. See the full version for details. From the 
table we can deduce the following conclusions: 


— The running time in the semi-honest setting is about 10-20 times faster than 
in the covert setting, which is in turn about 15-20 times faster than in the 
malicious setting. 

— A lot of the extra data needed to be transmitted in the Malicious case is 
related to the large number of commitments and decommitments which need 
to be transmitted. Thus our optimisation techniques are less effective in 
the Malicious case. This points to a clear direction for future research in 
optimising the Malicious case. 

— If one is not willing to assume that the KDF is correlation robust we see 
that using our technique based on secret sharing can reduce the amount of 
data being transmitted, compared to the base scheme, without increasing 
the computational cost. 

— In all cases we see that the correlation robust variant using Garbled-Row- 
Reduction is the most efficient variant. The extra efficiency comes from the 
free XOR’s which reduce both the number of encryption/decryptions which 
need to be performed and also the amount of data needing to be transmitted. 
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— Note that if we assume the random oracle model, and so could implement our 
KDF via a single hash function call then for Covert adversaries the protocols 
run significantly faster. That this does not apply as much to the Malicious 
case is due to the fact that most of the time in the Malicious case is spent 
with creating, sending and verifying the various commitments. 


We pause to compare our two optimisations with the optimisation in bandwidth 
suggested in Ep In our system P4, the circuit constructor, sends commitments 
to all circuits that it constructs and to its own inputs, and a random subset 
of these committed values are checked by Pg. In ig it is suggested that Pı 
commits to a random seed, and uses this to generate the circuit. Then only 
the commitment to this seed, and eventually its decommitment, need to be 
transmitted. This means that Pz needs to compute the circuit given the seed. 
Whilst this optimisation clearly significantly reduces the consumed bandwidth, it 
actually leads to a significant increase in the time needed to perform the protocol. 
To see this consider our Covert experiments in Table[]] The optimisation in 
would reduce practically to zero, the entry for the “Send Time” column, but P2 
would now need to recompute almost all of the calculations in the “Precomp 
Time” column. Thus the technique of is only to be compared to ours in the 
situation where bandwidth is very expensive and CPU time is very cheap. 
Before passing onto our larger example we note the following. If we let p 
denote the proportion of XOR gates within a circuit, and we let N denote the 
amount of data needed to be sent per circuit in the standard Yao construction, 
then the average amount of data needed to be sent per circuit gate when using 
the free XOR gates and GRR methods is 3/4- (1 — p) - N. Whereas if we do 
not use the free XOR gate method and instead use the method based on secret 
sharing, this value becomes N/2. Hence, if we are willing to assume correlation 
robust KDFs, then the method which uses secret sharing and does not use the 
free XOR method, will be more efficient as long as the fraction of XOR gates, p, 
is smaller than 1/3. However as can be seen from the column entitled “% XOR 
Gates”, this proportion is generally much larger than 1/3, especially in the case 
of Covert and Malicious adversaries where we have had to extend the circuit 
by a large linear component. This expansion is performed to cope with possible 
adversarial behaviour related to P,’s input, see the full version for details. One 
should note that these theoretical estimates of bandwidth are never achieved 
fully in practice due to overheads in the underlying data transmission mechanism 
and the fact that they assume a bit-oriented communication mechanism, whereas 
practical communication is performed in bytes. Hence the saving we achieve in 
gate transmission is about 5-10% less than one would predict purely by theory. 


Example 2 - Evaluating AES: As our second example we created a circuit 
which computes an AES encryption of a single 128-bit block with respect to 
a 128-bit key. Here P,’s input is the secret key, and P’s input is the message 
block. We require that P> learns the encryption of its message under P,’s secret 
key, and that Pı learns nothing. Compiling such a circuit using the Fairplay 
compiler, and applying various optimisations, resulted in a circuit, which we 
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Table 2. Experimental Results for Example 2 (Again times are in seconds) 
Input % XOR Precomp Send OT Calc Total Total 
fs ag ps ie ie 
Semi- 
Honest PRF-SS [33880 
CoR-GRR |33880 
ROM-GRR]33880 
Covert| Indep. 
Inputs | PRF-SS |34264 
CoR-GRR |34264 
ROM-GRR]34264 
Malic. Random 2624 | 987442 
Comb. | PRF-SS [45944 2439 | 711729 
CoR-GRR |45960 
ROM-GRR/45881 j 1114 | 417907 


denote by os”, with 33880 gates, where each gate is a 2-to-1 gate. This circuit 
was derived in a way to try to minimize the number of non-XOR gates. Again, 
we stress, the above circuit size purely implements the AES functionality, it 
does not include the extension of the circuit to cope with P,’s input in the case 
of Covert and Malicious adversaries. Note that the key schedule takes up only 
about 15% of the circuit, hence encrypting a sequence of message blocks as in 
CBC-Mode encryption will scale almost linearly with respect to our data. 

We repeated our experiments from above, but in Table B] we only present the 
times for the most efficient choice for the input encoding. 

We conclude that performing the Yao protocol is certainly feasible on compli- 
cated functionalities such as AES encryption. For the case of honest and covert 
adversaries we again see that the computation and bandwidth consumed, when 
we use correlation robust KDFs and the GRR method, greatly reduces in com- 
parison to the base case. If one is not willing to assume correlation robust KDF's 
(or use the ROM) then our secret sharing based optimisation greatly reduces 
the bandwidth without affecting the run times. For the malicious case the im- 
provement in the secret sharing based version is less pronounced due to the large 
number of commitments which need to be transmitted and opened. This clearly 
points to the place where future optimisation research needs to be performed, 
namely in reducing the number of commitments needed in the situation of ma- 
licious adversaries. However even without such future optimisation we note that 
performance can be significantly reduced by taking advantage of the inherent 
parallelism in the algorithm in the Malicious case (in which Pı generates many 
commitments and P2 verifies a subset of them). For web service or cloud com- 
puting applications, where server farms are common place, an improvement in 
computational time by a factor around sı could be expected. 

We end by noting that many application domains of a secure evaluation 
of AES, for example the one-time program example from Ld, require only 
security against semi-honest adversaries. Hence, such applications are already 
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within the reach of practical realisation. Furthermore, this application requires 
no computation of the OT or data to be sent. Thus the party generating the 
one-time-program will take the time needed in our Precomp Time column, and 
the evaluator (after querying the one-time-memory) will take the time needed 
in the Calc Time column. 
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Abstract. Multi-party secure computations are general important procedures to 
compute any function while keeping the security of private inputs. In this work 
we ask whether preprocessing can allow low latency (that is, small round) secure 
multi-party protocols that are universally-composable (UC). In particular, we al- 
low any polynomial time preprocessing as long as it is independent of the exact 
circuit and actual inputs of the specific instance problem to solve, with only a 
bound k on the number of gates in the circuits known. 

To address the question, we first define the model of “Multi-Party Computa- 
tion on Encrypted Data” (MP-CED), implicitly described in 
[DNO3J. In this model, computing parties establish a threshold public key in a pre- 
processing stage, and only then private data, encrypted under the shared public key, 
is revealed. The computing parties then get the computational circuit they agree 
upon and evaluate the circuit on the encrypted data. The MP-CED model is inter- 
esting since it is well suited for modern computing environments, where many 
repeated computations on overlapping data are performed. 


We present two different round-efficient protocols in this model: 


— The first protocol generates k garbled gates in the preprocessing stage and 
requires only two (online) rounds. 

— The second protocol generates a garbled universal circuit of size O(k log k) 
in the preprocessing stage, and requires only one (online) round (i.e., an 
obvious lower bound), and therefore it can run asynchronously. 


Both protocols are secure against an active, static adversary controlling any num- 
ber of parties. When the fraction of parties the adversary can corrupt is less than 
half, the adversary cannot force the protocols to abort. 

The MP-CED model is closely related to the general Multi-Party Computation 
(MPC) model and, in fact, both can be reduced to each other. The first (resp. sec- 
ond) protocol above naturally gives protocols for three-round (resp. two-round) 
universally composable MPC secure against active, static adversary controlling 
any number of parties (with preprocessing). 
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1 Introduction 


Secure Multi-party Computation (MPC). Protocols for MPC enable a set of parties 
to correctly evaluate a function such that no information about the private inputs of the 
parties is revealed, beyond what is leaked by the output of the function. This notion was 
first presented by Yao for the two-party case, and by Goldreich et al. 
for the multi-party case. However, implementations for MPC are notoriously ineffi- 
cient. Many protocols implementing them have delays associated with the depth of the 
circuit and even constant round protocols produce very long delays. The question that 
we want to settle in this work is whether one can use preprocessing computation in order 
to “be ready” once the inputs and the actual circuit (problem) to compute on are given. 
Note that the world of computing is transforming into “cloud services” where parties 
can “rent” computational resources. Thus, it may make sense to perform a lengthy pre- 
processing in the background, with no specific input and problem to solve in mind, just 
as a preparation. To this end cloud resources can be employed on behalf of users, and 
massive computations and communication can be performed. Then in the online stage 
once the input is given and the circuit determined, it can be performed much faster given 
the preprocessing. As long as at least one of the servers in the cloud is not corrupted, 
the correctness and privacy of the online stage computation is guaranteed. 

We consider the following variation on secure multi-party computation, called multi- 
party computing with encrypted data (MP-CED): (1) The computing parties publish a 
shared public key, and hold shares of the matching private key. (2) The parties also know 
some bound on the circuit size that they will be required to compute securely. The par- 
ties then perform a preprocessing stage. For this stage too, we may try to minimize the 
parties’ work and computation rounds, but this is not the main goal, which is the effi- 
ciency of the on-line stage. (3) The input distribution is a database of encrypted data that 
can be published by many parties (not necessarily those taking part in the computation); 
i.e., think about the parties as a service (like the census bureau) computing on behalf 
of a larger population. (4) The concrete computation circuit (or circuits) is given, and 
the input to use from the database (their indices in the database) are determined. Then 
and only then (5) the parties are engaged in a short computation to achieve the task and 
produce the output while protecting the private data. Note that the input database may 
be reused for many computations. 

We remark that our model is somewhat related to a multi-party extension of the 
model by Rivest, Adleman and Dertouzos [RAD78]. They put forth a scenario for secure 
computation over database of encrypted data, called Computing with Encrypted Data 
(CED). This model is highly attractive since it represents the case where a database is 
first collected and maintained and only later a computation on it is decided upon and 
executed (e.g., data mining and statistical database computation done over the encrypted 
database). We discuss the encrypted data model and the multi-party version here, and in 
fact show that MP-CED and MPC can be reduced to each other (shown in SectionB.3p. 


1.1 Motivation 


We consider protocols in the universal composability (UC) framework introduced by 
Canetti [COI]. UC secure protocols remain secure even when executed concurrently 
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with arbitrary other protocols running in some larger network, and can be used as sub- 
routines of larger protocols in a modular fashion. 


Round-FEfficient Protocols with Preprocessing. Round complexity is an important 
criterion for the efficiency of an MPC protocol. A long line of work, including 

[DIK* 08], focused on reducing both the round 
complexity and communication complexity. 

Also, it is known that UC secure computation of general functions is not possible 
in the plain model in the case of honest minority. In particular, UC secure two-party 
computation of a wide class of functionalities was ruled out by [CKLO3]. To 
circumvent these impossibility results, it is common to assume some pre-computation 
setup, and the most common assumption is that a common reference string (CRS) is 
made available to the parties before the computation. Canetti et al. showed 
that (under suitable cryptographic assumptions) a CRS suffices for UC secure MPC of 
any well-formed functionality. 

In our work, we consider stronger relaxation on the setup, called general preprocess- 
ing in which the parties perform some work as long as it is independent of the 
inputs and the circuit for which the actual computation is to be done later. The main 
motivation for this model is to reduce the amount of work during the execution of the 
protocol beyond a preprocessing phase. 

Considering the two aspects above, we ask the following natural question: 


Allowing any polynomial time preprocessing (in some input parameter) before 
the circuit (whose size is bound by the same input parameter) and the inputs 
are known, is there a very small constant round protocol? 


1.2 Our Results 


We address the aforementioned question affirmatively by constructing two different 
round-efficient protocols for MP-CED, which we call Pı and P2. Both protocols can be 
naturally transformed into round-efficient protocols for MPC (c.f. Section B.3). Each 
protocol has its own advantage depending on the following parameters: 


1. round complexity in the online stage (our major concern), 
2. round complexity in the preprocessing stage, and 
3. the number of gates constructed throughout the protocol. 


In terms of online round complexity, protocol P is “two rounds” whereas that of pro- 
tocol P2 is “one round” (which is optimal, since even non-secure computation need to 
collect the data and it takes one round). There are some cases, however, in which the 
preprocessing round complexity of Pı is better, under some efficiency considerations. 
We use general constant-round MPC protocols for the preprocessing stage in 
P2, whereas in P4 we can use the protocol given in Appendix[A] which requires exactly 
2n rounds. When n is small enough, preprocessing in Pı can be more round-efficient 


' Preprocessing in is independent only of the inputs (it depends on the circuit to be 
evaluated), whereas we require preprocessing to be independent both of the circuit and of the 
inputs. 
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(when n is large, a general MPC protocol can be used in P4, too). Also, the number of 
gates constructed in P% is larger than that in P1. To evaluate a circuit with up to k gates, 
Pı constructs k garbled gates in the preprocessing stage, as explained below. In con- 
trast, Po generates a universal circuit in the preprocessing stage, which is later 
used (in the online stage) to evaluate a given circuit. The smallest known universal cir- 
cuit that can evaluate a circuit with k gates has O(k log k) gates [KS08]. We overview 
the two protocols in the following. 

First Protocol (Pı). In a big picture, we follow the framework of Yao’s garbled 
circuit technique. However, the main difference is that, in our protocol, garbling is done 
on the individual gate level so that this procedure can be executed in the preprocessing 
level independently of the circuit to be given and computed later. In the online stage, 
construction of wires between gates according to the given circuit is performed. 


— In the preprocessing stage, the parties generate a ‘garbled’ truth table for each in- 
dividual gate. Truth tables are for NAND gates, and they have four rows and three 
columns -— left-input, right-input, and output. Each row is randomly shuffled, and 
each element is an encryption of Boolean value. We emphasize that no party knows 
anything more than the fact that it’s a randomly shuffled encrypted table for NAND. 

In addition, a fresh pair of public key and (encrypted) private key is generated 
for each row. This key is used for constructing encrypted wiring information in the 
online stage, when the circuit is given. 

— In the online stage, given the encrypted data and a circuit, the computing parties 
‘connect’ truth tables by adding wiring information. The wiring information tells, 
given two tables Tpred, Tsucc according to the topology of the circuit, which row 
of Tprea s Output column is equal to which row of Tsuce s input column. We note 
that this information should be carefully revealed; otherwise, the adversary may try 
computing different rows of the truth tables using the wirings, and may learn more 
than is allowed. In fact, during the computation (online stage), exactly one row’s 
wirings for each table should be revealed. 

To enable such wirings we introduce Multi-Party Conditional Oblivious Decryp- 
tion Exposure (M-CODE) (in Section Bh, which is a multi-party extension to the 
CODE functionality, introduced in for the two party case. M-CODE as- 
sumes a group of parties share a secret key x of a public key y. Three ciphertexts 
Cout, Cin, Ckey — all encrypted under y — and a new public key z are given as input. 
For £ € {out, in, key}, let me be the plaintext encrypted in ce. If Mout equals Min, 
M-CODE outputs E, (Mikey): Otherwise, M-CODE chooses a random value r and out- 
puts Æ-(r). The computing parties use M-CODE such that, for each row of a truth 
table, the three ciphertexts of the M-CODE are (1) output value of the previous table 
(2) the input value of this row and (3) the secret key for this row. We refer the reader 
to SectionB, [for more details. 

With two round implementation of M-CODE for ElGamal encryption, we obtain 
a two-round protocol for MP-CED and a three-round protocol for MPC. 


Theorem 1. Assuming the DDH assumption holds, protocol Pı is a two-round UC 
secure protocol for MP-CED in the Fp hybrid — and, thereby three-round UC secure 
protocol for MPC in the F,;, hybrid in the general preprocessing model — against an 
active and static adversary as long as at most t < n computing parties are corrupted. 
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The protocols manipulate linear number of gates in the circuit size. Furthermore, if 
t < n/2 parties are corrupted, P, is robust against abort. 


Second Protocol (P2). Protocol P2 follows Yao’s garbled technique more closely than 
Pı. However, the circuit that is to be garbled is a universal circuit to main- 
tain independence of the circuit to be given. Optimal round complexity in the online 
stage is achieved by putting a simple constraint on the input-layer labels in the garbled 
circuit and by employing the multiplicative homomorphism of ElGamal encryption. As 
in the first protocol, a group of parties share a secret key x of a public key y. 


— In the preprocessing stage, the parties generate a garbled circuit of a universal 
circuit Cy, with some special restrictions on keys of input wires. In the garbled 
circuit Cy, there are two keys wh and wi for each wire 7, where wi corresponds to the 
wire carrying bit b (see Section B2]for more detail). The special restriction on input 
wires is that w{ /w) = h for a random global value h unknown to any party. The two 
keys can be constructed by picking wġ uniformly at random and letting wi = h- wi. 
In addition to the garbled circuit of Cy, the following encryptions are generated: 
(1) the encryption E,(h) and (2) E,,(w},) for each input wire i. Construction of a 
garbled circuit along with aforementioned encryptions — i.e., E,(h) and E,,(w})’s 
— can be performed using a constant-round UC secure protocols for general MPC 
{[PSO8]]. Input contribution of a bit 0 is done by E,,(h°), and for a bit 1, 
re-encrypted E, (h+) is used via homomorphism. 

— In the online stage, for each input wire i where a bit b is the contributed input for the 
wire, computing parties obtain w}. The encryption FE, (w},) can be obtained via homo- 
morphism given the encrypted input c; = E,(h°), giving E,(w))-c; = Ey (wih) = 
Ey (w}), since w = h-wi,. Now parties obtains the key w} for each input wire i using 
threshold decryption and can locally evaluate the garbled circuit. Note that w, does 
not leak any information on b since it’s randomly distributed (with wt _, hidden). 


Theorem 2. Assuming the DDH assumption holds, protocol P2 is a one-round UC 
secure protocol for MP-CED in the Fz hybrid — and, thereby two-round UC secure 
protocol for MPC in the F;, hybrid in the general preprocessing model — against an 
active and static adversary as long as at most t < n computing parties are corrupted. 
The protocol processes k log k gates where k is the circuit size. Furthermore, ift < n/2 
parties are corrupted, Pa is robust against abort. 


1.3 Related Work 


Round Complexity. Beaver et al. showed the first MPC protocol that required 
constant (but large) number of rounds, and Damgard and Ishai presented the first 
adaptively UC secure protocol that achieves two rounds in the (linear) preprocessing 
model when the number of malicious parties t < n/5 and some higher constant rounds 
when £ < n/2. Recently, Ishai et al. constructed UC secure protocol with malicious 
majority in the OT hybrid model running in (large) constant rounds (see Figi. 


* Instantiation of protocol P: (in particular, key setup in the preprocessing) is parameterized by 
t. Therefore protocol Pı is not a ’best-of-both-worlds’ protocol [IKLPO6]. This is true of P2, 
too. 
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ficorr = number of corrupted parties, B = Boolean, Ar = Arithmetic, St = static, Ad = adaptive, 
SA = stand-alone 


Fig. 1. UC Secure Constant-Round MPC Protocols (Left) and MP-CED Protocols (Right). 
We denote by d the depth of a given circuit, by n the number of parties, and by t the number 
of corrupted parties. Pı and P2 denote the protocols proposed here. Here the column ‘rounds’ 
means the number of rounds in the online stage. 


For the two-party setting, which is a special case of MPC, Katz and Ostrovsky 
showed that it’s impossible to construct a secure protocol running in four rounds us- 
ing enhanced trapdoor permutation (eTDP) or homomorphic encryption in a black-box 
manner in the plain model, and they constructed a five-round protocol. To overcome this 
lower bound, Horvitz and Katz used CRS to construct a UC secure two-party 
protocol in two rounds. Nielsen and Orlandi gave a two party protocol using a 
cut-and-choose approach. In a big picture, their idea is somewhat similar to ours: after 
many garbled gates are generated, they are connected to each other according to the 
circuit to be evaluated. 

In the (non-UC) stand-alone setting, the work of gave a general 
non-interactive reduction of any n-party functionality computed by a polynomial size 
Boolean circuit into a (possibly randomized) functionality of degree-3 over GF(2). 
Combining this reduction with any secure protocol with malicious majority (for exam- 
ple, [GMW87]) leads to round-efficient protocols in the stand-alone setting. 


MP- — a nontrivial instantiations for CED were shown, a with Sander 
et al. , who gave a protocol for circuits in —— abe extended this 
result z arenes any function in NLOGSPACE [BL96j. cent Gentry pre- 
sented a construction for any polynomial size circuit by T doubly-homomorphic 
encryption scheme from ideal lattices [G09], however it is not yet clear if this can give 
efficient protocols for MP-CED (see discussion in Saati 

MP- pas was also considered by Franklin and Haber and the subsequent 
works [D00] [CDNOI] [DNO3]. In their works, after a ery encryption key is es- 
ae pen party broadcasts the encryption of its input, and the parties evaluate 
the circuit on the encrypted data. However, they do not explicitly treat the setting as 
a unique model for MP-CED, with a specific setup state that is independent of the in- 
puts and the circuits to be computed, and do not consider input separation — inputs can 
be contributed by parties that do not take part in the computation. Note that all these 
previous works in the model dealt with the two party case, which we extend herein to 
the multi-party case. 
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The protocol given by Cramer et al. computes an arithmetic circuit and 
achieves security in the case of honest majority, but the number of rounds is linear in 
the depth of the circuit. A UC adaptively secure protocol with the same round complex- 
ity was given by [DNO3]. Jackobson and Juels use mix-and-match approach to 
compute on encrypted data, but their approach requires even more rounds (linear in the 
sum of the depth of the circuit and the number of parties). Figure[]]lists these previous 
works, in some relations to our protocols (while concentrating on on-line rounds, and 
omitting some of the advantages our results has beyond the table). 


2 Preliminaries 


For any integer t, let [t] = {0,1,...,¢ — 1}. Let k be a security parameter. We choose 
a cyclic group G4 of order q ~ 2* with a generator g where the DDH problem 
is hard. For example, G4 can be a subgroup of order q of a multiplicative group Z% for 
a safe prime p = 2q + 1, i.e., GZ = {g°,g',...,g% +} (mod p). We assume G4 is 
known in advance. 


ElGamal Encryption. ElGamal encryption is semantically secure under the 
DDH assumption over G4 [TY98}. The key generation algorithm generates a public/ 
secret key pair (y, x) where x €r |q] and y = g”. Encryption of a message m € G4 
under a public key y, denoted by E,,(m), is (g”, my”) where r €p |q]. Decryption of a 
ciphertext c = (a, 3) with the secret key x, denoted by D,(c), is G/a”. 


Homomorphism. Multiplication of two ciphertexts Ey(m1) = (g"!,m iy") and 
E,(m2) = (g",meay") is defined as (g™ 1", mymay"*"2), which shows the ho- 
momorphism of ElGamal encryption (i.e., Ey (m1) + Ey(m2) = Ey (mı -mz)). In ad- 
dition, encryption keys are also homomorphic in the sense that given key pairs {(y; = 
g% , £i) Jai, the pair ([], yi, >); vi) is a valid key pair. When two ciphertexts encrypt the 
same message, we denote c1 = C2. 


Zero-Knowledge Proofs of Knowledge (ZK-PoK). A proof of knowledge is a proof 
for a relation R, in which the prover convinces the verifier that an instance is in the 
language, and also that the prover knows a witness for this instance. We will use 
standard notation to denote proofs of knowledge related to discrete log. For example, 
PK{b: a = g?} denotes a proof of knowledge where the prover convinces the verifier 
that she knows the value of b, such that a = gh when a is known to both. 

In the common reference string (CRS) model, we can use non-interactive zero- 
knowledge proofs (NIZK) due to De Santis et al. (see the discussion in 
Section 6]) which is UC-secure [COI]. In the random oracle model (ROM), 
the above proof systems can be efficient NIZK using the standard Fiat-Shamir technique 
combined with OR proofs of X-protocols [CDS94]. 


Secret Sharing [S79E87]. A secret sharing scheme allows a secret s € [q] to be shared 
among n parties, such that a threshold of t + 1 parties can recover the secret, whereas 
any smaller set of parties can not learn anything about the secret. In Shamir’s secret 
sharing scheme, the shares are values of a degree-t polynomial, and the secret is the 
free coefficient of the polynomial. 
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We show below how the parties can share and recover the secret s. Moreover, the 
parties may choose to recover d° for some d € GJ, or an ElGamal encryption of d° 
(without learning anything about the secret s). 


— Sharing: A dealer chooses at random a degree t polynomial Q(x) := staya+---+ 
a,x‘, where the free coefficient is the secret s. The share of party P; is s; = Q(i). 

— Recovering s: Let T be a set of t + 1 parties. They evaluate Q'(0) = X ier siLi(0) 
to recover s, where L; is a Lagrangian on the points in T 

— Recovering an exponentiation dë: Similar to above, the parties can evaluate dê = 
d2'(0) — dX ier Li(0) — Ter ds:4:0), using only {d Jier. 

- Recovering E,,(d*): Using multiplicative homomorphism of ElGamal, the parties 
evaluate E,(d*) = B,(d?) = Hier E,(d#t) = Tier Ey(d*)*©, using 
only {Ey (d) fier. 


Multi-party Conditional Oblivious Decryption Exposure (M-CODE). We introduce 
Multi-Party Conditional Oblivious Decryption Exposure (M-CODE). M-CODE assumes 
a group of parties share a secret key x of a public key y. Three ciphertexts Cout, Cin, Chey 
— all encrypted under y — and a new public key z are given as input. For £ € 
{out, in, key}, let me be the plaintext encrypted in ce. If Mout equals Min, M-CODE 
outputs E; (Mey). Otherwise, M-CODE chooses a random value r and outputs E; (r). A 
variant of this functionality for the two party case was initially introduced by | ices OT. 
The intuitive idea is to generate a ciphertext that encrypts Mkey multiplied by (Mout/ 
Min)" forarandom r. If min = Mout, then the output would be Mkey. We assume party 
P; has 2;, all the parties know Cout, Cin, Chey; Z, (Y, Y1 = 9%,- --, Yn = g7”), and let 
Cout = Ey(Mout) = (a, 8),Cin = Ey(Min) = (Y, 8), Chey = Ey(Mkey) = (A, u). The 
protocol for M-CODE proceeds as follows: 


1. Each party P; chooses e; Er [q], and computes e; = (a/y)%, G = (6/0)%, Ti = 
PK {e;: 6 = (a/y)%, and ¢; = (8/5) },, and broadcasts (€;, i, 7). 

2. Lete = Iles ci and ¢ = Iles ¢; where Sı is the set of parties which sent 
valid messages. Each party P; chooses r; randomly and computes d; = (di1, di2) = 
E, ((eA)*) and Yp; = PK (riz) : da = g™ , dig = Z (EA) , Yi = grt. 
and broadcasts (d;, Yi). 

3. Let S2 be the set of parties that sent valid messages in steps 1 & 2. If |S2| < t, 
then the protocol aborts. Each party P;, using the homomorphic multiplication, com- 
putes d = (di,d2) = E; ((€A)*) = jes, a where L,(-) is a Lagrangian 
on the indices in S2. P; uses homomorphic operations to compute EF, (Miey) = 
(1/d1, ¢u/d2), which is 


E, (Cu/(ed)*) = E. (E5 } (u/%*)) = E, (Co: 


where e = } jcs, ĉi 


3 Lagrangian L; on the points in T is a degree t polynomial such that L; ” a E =i 
and Li(x) = 0 if x € T and x # i. The polynomial Q'(x) = $ ,er SiLi(x) is a degree t 
polynomial that goes through the points (i, si)ier, and thus must be Q(x). 
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3 Multi-party Computing with Encrypted Data 


We assume the circuit C of interest is normalized: all intermediate gates are NAND 
gates, and output gates are IDENTITY gateg. We can easily attain this circuit by adding 
another layer of IDENTITY gates on top of a circuit that consists of NAND gates. 


3.1 First Protocol (P1) 


In the first protocol, called P1, each gate is garbled, and then the computing parties 
‘connect’ gates by adding wiring information using M-CODE. 


Preprocessing Stage. The first step is to establish a global public key y for ElGamal 
encryption. The computing parties have shares of the corresponding secret key x. Once 
the public key is established, the next step is to generate truth tables for individual gates. 
The columns of input, output, and intermediate gates differ slightly, as can be seen in 
Figure] which shows the structure of truth tables. 


1. Input and Output. These are encrypted with the global public key y. 

2. Placeholders for the wiring information. This connects a row of the truth table to 
matching rows in successor gates. 

3. The columns PK and SK contain a random ElGamal key pair, where the private key 
is encrypted under the global public key y (and the wiring information is encrypted 
using the secret keys in SK). 

4. For output gates, ciphertexts in column Final encrypt the same plaintexts as cipher- 
texts in column In. 


During the preprocessing stage, the parties can generate polynomial number of gar- 
bled gates, that can later be used for evaluating circuits. Therefore it suffices to know a 
bound on the sizes of circuits to be evaluated later. Preprocessing can be done in con- 
stant number of round using general MPC protocols {[PSO8]]. If the number 
of computing parties is small, it can be done explicitly in 2n rounds, where n is the 
number of computing parties, using the protocol in Appendix [A] 

Input contribution is performed by publishing a ciphertext c = (c1,c2) = E,(g") 
for an input b € {0,1}. This can be done securely by adding PK{r : (& = 9g", &2 = 


y”) or (c1 = g", co = gy") }. 


Online Stage: Generation of Wires Between Garbled Gates. In Figure] G; is the 
left predecessor of Gk. The connection between the two gates should be established 
through some “wiring” such that during the computation the output of G; can be prop- 
agated to the left input of Gk. So, rows of T; with output value b € {0,1} should be 
connected to rows of Tẹ with left input value b. 


Requirements for Wiring. In our protocol, the following conditions are considered in 
generating wires. 


* An IDENTITY gate has single input bit (wire) and output bit, and it copies the input bit value 
to its output. 
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3 || E[0] E[1]|E[1] phase | Ey (skesr)|\pkesr| Ey (skisr)|Epkysr-pkpan (ske, *) 

4 E[0] E 0] E 1] Pkrar|Ey(skrar)|pkrar|Ey(skrar) Epkrarpkran (ska, x)| Gj 
Tilling Ing} Out|/PKz SKz PKr SKR WireSji— k 

1 - [Ell] ee Ek, (skril, skk2L,*, *) T Out Wires:, 

2 E(0] a Epis (& *, Skk3L, SkkaL ) i EIk : jo x) 
3 EI] tee Epkis (SkkıL, Skk2L, *, *) i 
4 El] ee Epkia(SkkıL, Skk2L,*, *) 


Fig. 2. Garbled Truth Tables for the Gates (G4, G;, Gk, Ge). The topology of the gates is given 
on the right. Gj is an input gate, Ge is an output gate, and G;, Gk are intermediate gates. Table Tz 
is the truth table describing gate Gz. y is the global public key. Each row of an intermediate truth 
table has two sets of (secret, public) keys, and contains the wiring information, “connecting” it to 
the next gate, encrypted using these two keys. E[0] and E[1] are Ey (g°) and FE, (g*) respectively. 
In table T;, pki1 = pkiit-pkiir, and pki2,..., pkia are defined similarly. In the Wires columns, 
E(a, b,c, d) denotes concatenation of E(a),..., Ed). 


— (Encrypting the Wiring Information.) The wiring information, except wirings con- 
necting an input gate to an intermediate gate, should be encrypted. Public wiring 
may help the (even semi-honest) adversary to learn more information than the out- 
put of C. Therefore, it is encrypted with the public key stored in columns PK; and 
PKR. 

— (Conditional Exposure of Wiring Information.) For the computation to proceed, the 
protocol should reveal the wiring information for the rows along the computational 
path. In the beginning, wirings from input gates is public. Along the computational 
path, on each gate, exactly one row should allow decryption of the wiring information. 

— (Oblivious Generation of Wiring Information.) The wiring information are added to 
garbled gates after they are built. It is essential that, even if the truth table is encrypted 
and shuffled, the parties should still be able to add the wiring information. 


Computation of a Circuit Using Wires. Let T;[{a][b] denote the element located at col- 
umn a and row b in T;. The column Wires contains wiring information, and we de- 
note the column Wires from T; to Tk by Wires El Looking at the column Wires 
alone, Wires(v) denotes the vth row of this column in the plaintext form. For exam- 
ple, Wires; .4j(2) = (*,*, skk3L, Skra) in Figure P] We also use Wire(v, w) to de- 
note the wth element of Wires(v). If Wirep— x (v, w) A *, it means that T;[Out][v] = 
T,[In][w]. In Figure] for example, we have Wireți— x] (2,3) + * because T; [Out] [2] = 
T;,[Inz][3] = £0]. 

This wiring information helps the circuit computation to proceed correctly. The com- 
putation proceeds in order from input gates to output gates. In Figure] for example, if 


> If G; has another outgoing wire, say to Gm, Ti will have another column WireSti— m]: 
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Connecting the Gates: Fill in Wires Columns. 


— For every Wire;;_.4|(v, w) of an intermediate gate T;, run M-CODE for Cout = T;[Out][v], 
Cin = Ty [In][w] and crey = Te [SK][w], with the key z = T;[PKz][v] - T:[PKa]|[v]. 
— For every Wire,;.4](v, w) of an input gate Tj, run M-CODE for Cout = T;[Out][v], cin = 


Ty, [In][w] and cpey = Te [SK][w], with the trivial key z = g°. 


Depending on the circuit topology, the subscript of a column may differ (e.g., Inz, Inr, or In). 


Local Computation. Each party computes the output of C using the M-CODE transcripts on 
the input gates. 


Fig. 3. Online Stage of Pi 


row 2 of T; and row | of T} are on the computation path, then row 3 of Tj, is also on 
the computation path because w = 3 is the only row where Wirep— p) (2, w) A * and 
Wirejj;xj(1, w) # *. 


Constructing Wires. We implement each Wire; (v, w) using a M-CODE transcript 
for Cout = T;[Out][v], cin = Tr[In][w], Ckey = Th [SK][w], and z = Ti[PK] wh. This 
directly satisfies the requirements of encrypted wiring and oblivious wiring generation. 
Conditional exposure is achieved by executing M-CODE protocols in the input layer 
with a trivial public key z = 1, so that the wiring information in the input layer is 
known to every party. 

The description of P; can be found in Figure B] Running the online stage takes 
two rounds. The communication complexity of Pı is O(nk|C]|) (plus the NIZK, if we 
assume the CRS case) where |C| is the size of the circuit. 


3.2 Second Protocol (P2) 


The idea of P2 is that in a preprocessing stage, the parties generate a garbled circuit, 
using Yao’s technique, of a universal circuit. The garbled circuit has a restriction on 
the keys of input wires, that allows the online computation to take only one round in 
our model, as opposed to the two-round OT based approach of Yao. The preprocessing 
stage can be done in constant number of rounds, using general MPC protocols 
IPSOS. 


Preprocessing Stage: Garbling Universal Circuit. The first step is to establish a 
global public key y for ElGamal encryption. The computing parties have shares of the 
corresponding secret key x. In contrast to protocol P1, however, here, ElGamal encryp- 
tion is used only for input layer. 

Next, a garbled circuit for universal circuit is generated, using Yao’s garbled circuit 
technique [Y82]. In the generation procedure, for each wire i, two random keys, wġ 
and w} are generated. The key wi, (resp., w/) represents 0 (resp., 1) for wire i. For each 
gate Gj, a truth table Tj is generated. In each table, a private key encryption (denoted 


ê Depending on the circuit topology, if this is a left input or right input to the gate, the pair 
(Cin, Chey) may also be (Tp [inz] [w], T,[SKz][w]) or (Zh [In] [w], Ta [SK a] [w)). 
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Fig. 4. Garbled Truth Tables for the Gates (G;, G;, Gk, Ge). The topology of the gates is given 
on the right. G; is an input gate, Ge is an output gate, and G;, Gk are intermediate gates. Table 
Tr is the truth table describing gate G+. y is the global public key. Encryption Bisa private key 
encryption based on pseudorandom function with efficient verifiable range [LPO9J. 


G È, D) with efficiently verifiable range (based on pseudorandom function) is used 
(LPOOW. Figure]shows the structure of the garbled circuit. 


— Recall that we assume all output gates are identity gates, with only one incoming 
wire and only two rows in the corresponding truth table. Each row encrypts the 
Boolean value represented by the corresponding wire, and the rows are randomly 
shuffled. An example is given in Figure} in the first row of table Gy, the input value 
is 1 (the key w¥ represents 1), and it encrypts 1, which is the output value of this 
row. 

— For all other gates, each gate has two incoming wires and four rows. Each row en- 
crypts a key for the outgoing wire, which represents the appropriate Boolean value 
of NAND of the incoming wires’ values, and the rows are randomly shuffled. For 
example, in Gk of Figure] the first row encrypts wë, representation of 0 for wire 
k, since NAND of the values that the keys of the incoming wires represent (i.e., the 
value 1 represented by wł in wire i, and the value 1 represented by w? in wire j) 
is 0. 


To construct a secure protocol for MP-CED, we depart from the traditional Yao garbed 
circuit technique, by giving restriction on input wires. 


— A random element h € G% is chosen, which no party knows, and H = E,(h) is 
published. We emphasize that H is generated once and for all. In other words, every 
instance of garbled universal circuit can use the same H. 


— For input wire 7, two keys wi, wi € G4 are randomly generated, conditioned on 


w! = h- wi. Only the encryption of the first key, wå = By (wi) is published. 


Since we garble a universal circuit, it suffices to know a bound on the sizes of circuits 
to be evaluated later. A universal circuit of size O(k log k) can accept circuits of size k 
as inputs [KSO8}. 

Input contribution is performed such that for input b € {0,1}, a ciphertext c = 
(c1, C2) = Ey (h?) is published. 


7 Roughly speaking, in such an encryption scheme, given a ciphertext and a key, it is efficiently 
verifiable whether the given ciphertext was encrypted under the given key. This helps comput- 
ing parties to correctly compute the garbled circuit. 
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- When input is 0, publish E,,(1). 

— When input is 1, publish a re-encryption of H (recall H = (H1, H2) = E,(h)). 

A proof of knowledge is added, PK{r (a4 = g", 2 = y) orla = Mig’, e = 
Hə y”) } . 

Online Stage: Obtaining keys for input-wires. Computing parties need to obtain a 
key, for each wire j, that represents the Boolean value b that the corresponding input 
ciphertext encrypts — that is, wł. But the key should not leak any information about 
the input ciphertext. Our protocol meets such requirement by using homomorphism of 


ElGamal encryption. Let c; be the ciphertext of contributed input b € {0, 1} for input 
wire j. Computing parties work as follows: 


— For every input wire j, compute WÍ = wd - cj locally using homomorphism of El- 
Gamal encryption. Then, decrypt W/ via threshold decryption by computing parties 
using their shares for x. This gives w}, which matches the input b. 


— Each party computes the output of C using the key wi locally. 


Running the online stage in Pa takes one round. The communication complexity of 
Pz is O(nk|C]| log |C|) (plus the NIZK, if we assume the CRS case) where |C| is the 
size of the circuit. 


3.3 Discussion 


MP-CED vs. MPC with Preprocessing. General MPC and MP-CED can be reduced to 
each other. 


— Given a protocol z for MP-CED, we can construct a protocol 7’ for MPC with prepro- 
cessing, as follows. In the preprocessing stage of 7’, the parties share an encryption 
key. In the online stage of z’, each party publishes encryption of its input under the 
shared key, and the parties follow protocol 7. The resulting MPC protocol 7’ requires 
one more online rounds than the underlying protocol m. This approach is implicitly 
used in [FH96) {700} [CONO T] DNO3]. 

— Given a protocol 7’ for MPC, we can construct a protocol + for MP-CED, as follows. 
In MP-CED, the parties share a secret key, and the inputs are encrypted. Protocol 7 
should compute C’ on these given input ciphertexts. This can be done by the par- 
ties running protocol 7’ using a circuit C” derived from C. Circuit C” consists of 
two stages: the first stage of C’ gets shares of the secret key and the ciphertexts, 
and decrypts the ciphertexts to give plaintexts. The second stage of O” essentially 
evaluates C on these plaintext inputs from the first stage. In running the protocol 7, 
each party’s input is its share for the secret key. Circuit C’ has more gates than C. 
However, if the round complexity of 7’ does not depend on the depth of the circuit, 
then the round complexity of z is the same as the round complexity of 7’. 


On Basing MP-CED on Doubly-Homomorphic Encryption. Recently, Gentry con- 
structed a doubly homomorphic encryption scheme using ideal lattices [G09], which 


8 In fact, any homomorphic encryption can be used. We chose to use ElGamal encryption since 
it is already used in P1. 
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solves the CED problem. Since our goal is to give a round-efficient protocol, it is an 
interesting question whether doubly-homomorphic encryption allows non-interactive 
secure computation. However, this seems unlikely. 


— (Threshold Decryption.) It’s not known whether Gentry’s scheme supports threshold 
decryption. Thus, there has to be at least one party which can decrypt ciphertexts by 
itself. If this party sees the inputs (which are encrypted and published in the MP-CED 
model), it can decrypt private inputs of other parties and break the security. Thus, 
there must be a separation between parties who can decrypt and parties who get 
access to the input and intermediate ciphertexts. 

— (Malicious Parties.) Parties without decryption capability would compute a circuit on 
encrypted inputs using double homomorphism. In order for the protocol to compute 
output in a plaintext form, they have to submit some ciphertexts to a party with 
decryption capability. In the malicious setting, to make sure that they applied doubly 
homomorphism correctly, some kind of zero-knowledge proof should be added to the 
ciphertexts they submit. However, it is not clear how such a proof can be constructed 
when the verifier has the decryption capability — as mentioned above, it must not see 
the input ciphertexts. 


The above issue also stands against achieving MPC protocols against an active adversary 
with doubly homomorphic encryptions. 
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A Explicit Preprocessing of Pı 


During the preprocessing stage, the parties can generate polynomial number of garbled 
gates, which can later be used for evaluating circuits. Therefore it suffices to know a 
bound on the sizes of circuits to be evaluated later. We show how to generate such truth 
tables explicitly given y as a global public key. 

Throughout, each bit b will be encrypted with plaintext g’. Denote by (m) a simple 
ElGamal ciphertext (with randomness r = 0): (1, m). For an ElGamal ciphertext c for 
a bit, its negation ~c is defined as (g!) /c. For two ElGamal ciphertext a = (a1, a2) and 
b = (bı, b2), define ZKe,,(a, b) — the proof that b is a re-encryption a with public key 
u— as PK{r : bı = g”a1, b2 = uaz}. When public key is not specified, ZKe means 
ZKe,. The construction details can be found in Appendix[A3] 


A.1 Preliminaries: Joint Generation of Garbled Gates 


We associate a gate with the truth table for it. The entries of the truth tables are encrypted 
Boolean values, and the rows of each truth table are permuted, such that only a threshold 
of the parties can (1) recover any plaintext and (2) learn the permutation of the rows. 
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Sampling a Random Encrypted Boolean Value. In this protocol, n parties perform 
an oblivious analogue of XORing their respective random bits in n rounds. In our case, 
semantic security of ElGamal and the soundness of the attached proof guarantee they 
cannot. 


1. Each party P; selects a; Er {0,1} and computes & = Fy(g*‘), mi = ZKe((g°), ai) V 
ZKe((g'), &) and broadcasts (@;,7:). Let S = {j : my is valid}. Set @ — amin where 
min = minjes j. 


2. For j = 2,...,|S|: 


Let i be the j-th smallest element in S. P; computes an encryption d; such that d; = 
(di1, diz) is a re-encryption of @ if a; = 0 or a re-encryption of 7a otherwise. Then P; 
broadcasts (d;, 7; ) where 


vi = (Zke((9°), @ a) A ZKe(a .d)) V (Zke((9"), &) A ZKe(-a, &)). 
If W; is valid, then each party sets @ — dy. 


As in computing xor, it is enough that one of the bits is random (or, in our case, that one 
party is honest) to guarantee a random output as long as corrupt parties can not have 
their bit choices depend on the bits of other parties. The invariant of the protocol is that 
at the end of each round the ciphertext & encrypts exclusive-or of a;’s so far. 


Generating a Garbled IDENTITY Gate. First, run the procedure of sam- 
pling a random encrypted Boolean value. Let the output of the procedure is 
a. The first row of an IDENTITY gate is &, and the second row is computed 
by negating the value of a. IDENTITY 


Generating a Garbled NAND Gate. 


1. Each party P; selects a;,bi Er {0,1} and computes & = = E,(g%), b; = E (g? ‘)\ mam = 
ZKe((g°), ai) V ZKe((g'), a), and ¢; = ZKe((g°), bi ) V ZKe((g'), bi ), and broadcasts 
(Gi, bi, Ti, Qi). 

2. Run the procedure of sampling random encrypted Boolean values with @;’s. Let @ be the 
output of the procedure. Let S = {j : m; and ¢; are valid}. Set 6 — (g°) and ab — (g°). 


3. For j = 1,...,|S|: Let i be the j-th smallest element in S. P; computes encryptions d; and 
& such that 


— Ifb; = 0, then d; isa re-encryption of band & isa re-encryption of ab. 

— If b; = 1, then i is a re-encryption of = and & is a re-encryption of a/ ab. Then P; 
broadcasts (di, & vi) where y; = Y? V yl for y} = ZKe((g°), b) A ZKe(b, di) A 
ZKe(ab, &) and Y? = ZKe((g'), bi) A ZKe(—b, di) A ZKe(@/ab, &). If y is valid, then 
each party sets 6 — di and ab — Ej. 


The invariant of the loop is that at the end of each round the cipher- 
text ab encrypts exclusive-or of ab;’s so far. After a, b, ab are gener- —-+— 
ated, each party P; can complete the truth table, by locally negating Fae) 
the ciphertexts as described in the table. [a] b [ab- (=0)| 


NAND 
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A.2 Preliminaries: Jointly Recoverable Encrypted ElGamal Key Pairs 


Verifiable El1Gamal Encryption of Discrete Logarithm. To generate a jointly recover- 
able encrypted ElGamal key pair, we first introduce the following verifiable encryption 
of discrete logarithm. 

Let y := g*. We want to encrypt z in a verifiable manner. Let z; be the i-th rightmost 


bit of z for i € [k]. The verifiable encryption is Ey(z) = (%,..-,%-1,7), where 
% = Ey (g**?’) for i € [k]. The proof z is 
k-1 k-1 
( N (Zke( (9°), 2) V ZKe((9”), 2))) A Zke( 6), T] 4). 
i=0 i=0 
When we get (g2, ET ge") by decrypting E,(z), z can be extracted via ex- 


haustive search in polynomial time in k because each z; is a bit. 

Note that the encryption scheme is homomorphic if we ignore the proof part. Multi- 
plication of two verifiable encryptions E,(z ) = a . +, %1) and E,(w) = (@i,... 
Wk_1) is defined as E y(Z) ° E,(w) = (Zi, Wi,..., 21° Wea). 


k 


Generation of Jointly Recoverable Encrypted ElGamal Key Pairs. For simplicity, 
we omit the proof part of the verifiable encryption from the presentation below. Gener- 
ation of a key pair can be done as follows: 


1. Each party P; runs ElGamal key generation and obtains (kj, g% ). It broadcasts (g" , Ey (k;)). 
2. Let S be the set of parties whose encryptions are verified. In the PK column, IL; és gř' is set. 


In the SK column, |] £s Ey (kj) is set. 


Extraction of the secret key. Let (Yo,.--,¥k-1) := [jes Ê, (kj). Let g” be the de- 
cryption of Y;. Then given (g*°,... g7), we can extract the secret key )` jeg kj 
>>; zi by finding each z; via exhaustive search, which can be done efficiently since 


g” ē fone gh?" ae. 22y, 


A.3 Preprocessing of P1 


The preprocessing takes 2n rounds, since step 1.1 and step 1.2 can be executed concur- 
rently. This protocol is UC-secure, but for lack of space, we defer the proof of security 
to full version. 


Step 1.1: Garbled Circuit Generation - Intermediate Gates. For each NAND gate, 
run the procedure of joint generation of garbled NAND gate in Appendix AJ] to fill 
in In and Out Columns. For each pair of columns PK and SK, run the procedure of 
jointly recoverable encrypted ElGamal key pairs in Appendix A2 The above tasks 
are executed in parallel. 


Step 1.2: Garbled Circuit Generation - Output Gates 


1. Run the procedure of sampling random encrypted Boolean values in Appendix [AI] where 
each party P; selects a; Er {0,1}. Let @ be the output of the procedure and let S = {j : 
Pj behaved honestly during the procedure}. Fill in In and Out Columns as an IDENTITY gate. 


° Now in the online stage, k instances of M-CODE are executed since T [SK] [w] contains k 
ElGamal ciphertexts. The communication complexity blows up by multiplicative factor of k. 


286 


S.G. Choi et al. 


In addition, run the procedures of jointly recoverable encrypted ElGamal key pairs in 


Appendix [AZ]to fill the columns PK and SK. Let zı and z2 be the two keys in the column 
PK. 


2. In order to fill Final column, each party P; such that i € S broadcasts (diz = Ez,(g%), 


üiz = Ez,(g%)). Set G2, — (9°), Gay — (g'). Parties jointly compute E+, (gi %) and 


Ez, (g'~®i%), In particular, for i = 1,..., |S]: 


(a) 


(b) 


Let 7 be the j-th smallest element in S. P; computes encryptions di, é; such that di (resp. 
&) is a re-encryption of diz, (resp. di,z,) if a; = 0 or a re-encryption of =ã;, z, (resp. 
—=ű;i, z> ) otherwise. Then P; broadcasts (di, &, Wi) where pi = Y? V wi for 

pe = ZKe((g°), &) A ZKen ((g°), Gin) A ZKezq ((9°), Eza) A 


n~ 


ZKe:z, (@,, di) A ZKez,(@z,,€;) and 
bi = ZKe((g'), &) A ZKez, ((g'), Giz) A ZKez3((9"), Ginza) A 


ZKez, (7G), di) A Kez, (zy, &). 
If w; is valid, then each party sets @2, +— di, and az, +— €;. Otherwise, in the case of hon- 
est majority, parties collectively compute a; from threshold decryption using (y1,..., Yn) 
and compute @z, , @z, accordingly. In the case of honest minority, the protocol aborts. Fi- 
nally, set @z, — 7@2,. 
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Abstract. We present a new construction of non-committing encryption 
schemes. Unlike the previous constructions of Canetti et al. (STOC °96) and of 
Damgård and Nielsen (Crypto ’00), our construction achieves all of the following 
properties: 

— Optimal round complexity. Our encryption scheme is a 2-round protocol, 
matching the round complexity of Canetti et al. and improving upon that in 
Damgård and Nielsen. 

— Weaker assumptions. Our construction is based on trapdoor simulatable 
cryptosystems, a new primitive that we introduce as a relaxation of those 
used in previous works. We also show how to realize this primitive based on 
hardness of factoring. 

— Improved efficiency. The amortized complexity of encrypting a single bit is 
O(1) public key operations on a constant-sized plaintext in the underlying 
cryptosystem. 

As a result, we obtain the first non-committing public-key encryption schemes 
under hardness of factoring and worst-case lattice assumptions; previously, such 
schemes were only known under the CDH and RSA assumptions. Combined 
with existing work on secure multi-party computation, we obtain protocols for 
multi-party computation secure against a malicious adversary that may adaptively 
corrupt an arbitrary number of parties under weaker assumptions than were 
previously known. Specifically, we obtain the first adaptively secure multi-party 
protocols based on hardness of factoring in both the stand-alone setting and the 
UC setting with a common reference string. 


Keywords: public-key encryption, adaptive corruption, non-committing encryp- 
tion, secure multi-party computation. 


1 Introduction 


Secure multi-party computation (MPC) allows several mutually distrustful parties to 
perform a joint computation without compromising, to the greatest extent possible, 
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the privacy of their inputs or the correctness of the outputs. An important criterion 
in evaluating the security guarantee is how many parties an adversary is allowed to 
corrupt and when the adversary determines which parties to corrupt. Ideally, we want 
to achieve the strongest notion of security, namely, against an adversary that corrupts 
an arbitrary number of parties, and adaptively determines who and when to corrupt 
during the course of the computation (and without assuming erasureg]), Even though 
the latter is a very natural and realistic assumption about the adversary, most of the MPC 
literature only addresses security against a static adversary, namely one that chooses 
(and fixes) which parties to corrupt before the protocol starts executing. And if indeed 
such protocols do exist, it is important to answer the following question: 


What are the cryptographic assumptions under which we can realize 
MPC protocols secure against a malicious, adaptive adversary that 
may corrupt a majority of the parties? 


Towards answering this question, we revisit the problem of constructing non- 
committing encryption schemes, a cryptographic primitive first introduced by Canetti 
et al. as a tool for building adaptively secure MPC protocols in the presence 
of an honest majority. Informally, non-committing encryption schemes are semantically 
secure, possibly interactive encryption schemes, with the additional property that a 
simulator can generate special ciphertexts that can be opened to both a 0 anda 1. Ina 
more recent work, Canetti et al. (extending [B98]}) showed how to construct 
adaptively secure oblivious transfer protocols starting from non-committing public-key 
encryption schemes (i.e. the key generation algorithm must be non-interactive), which 
may in turn be used to construct MPC protocols secure against a malicious, adaptive 
adversary that may corrupt an arbitrary number of parties. 

Unfortunately, the only known constructions of non-committing public-key encryp- 
tion schemes (PKEs) are based on the CDH and RSA assumptions and 
the construction exploits in a very essential way that these assumptions give rise to 
families of trapdoor permutations with a common domain. If we allow for an interactive 
key generation phase, Damgård and Nielsen [DNOOJ, building on ICFGN96], 
constructed 3-round non-committing encryption schemes based on a more general 
assumption, that of simulatable PKEs, which may in turn be realized from DDH, CDH, 
RSA and more recently, worst-case lattice assumptions (see figure[]). 


1.1 Our Results 


First, we present a new construction of non-committing encryption schemes, which 
simultaneously improves upon all of the previous constructions in [CFGN96} |DNOOJ: 


Optimal Round Complexity. We provide a construction of non-committing PKEs from 
simulatable cryptosystems. Our construction is surprisingly simple - a twist to the 
standard cut-and-choose techniques used in [DNOO) - and also admits a fairly 


! Refer to [C00] Section 5.2] for a discussion on how trusted erasures may be a problematic 
assumption. 
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straight-forward simulation and analysis. In particular, our construction and the 
analysis are conceptually and technically simpler than those in [DNOO}; 
we avoid having to analyze the number of one’s in certain Binomial distributions 
as in and to consider a subtle failure mode as in [DNOOJ. 

Reducing the assumptions. Informally, a simulatable PKE is an encryption scheme 

with special algorithms for obliviously sampling public keys and random cipher- 
texts without learning the corresponding secret keys and plaintexts; in addition, 
both of these oblivious sampling algorithms should be efficiently invertible. 
We define a weaker assumption, which we refer to as trapdoor simulatable 
cryptosystems, and prove that it is sufficient for our construction and analysis to 
go through. Roughly speaking, we provide the inverting algorithms in a simulatable 
cryptosystem with additional trapdoor information (hence the modifier “trapdoor’’), 
which makes it easier to design a simulatable cryptosystem. 

Improved efficiency. While the main focus of this work is feasibility results (notably, 
reducing the computational assumptions for both non-committing encryption 
schemes and adaptively secure MPC), we show how to combine a variant of our 
basic construction with the use of error-correcting codes to achieve better efficiency. 
That is, the amortized complexity of encrypting a single bit is O(1) public-key 
operations on a constant-sized plaintext in the underlying cryptosystem. 


Thus, we obtain the following. 


Theorem 1 (informal). There exists a black-box construction of a non-committing 
public-key encryption scheme, starting from any trapdoor simulatable cryptosystem. 


Factoring-Based constructions. Next, we derive trapdoor simulatable cryptosystems 
from a variant of Rabin’s trapdoor permutations (c.f. [H99] [FFO2I) based on the 
hardness of factoring Blum integers. 


Theorem 2 (informal). Suppose factoring Blum integers is hard on average. Then, 
there exists a trapdoor simulatable cryptosystem. 


We stress that we do not know how to construct a simulatable cryptosystem under 
the same assumptions; specifically, inverting the sampling algorithm for ciphertexts in 
our construction without the trapdoor (the factorization of the Blum integer modulus) 
appears to be as hard as factoring Blum integers. This shows that trapdoor simulatable 
cryptosystems is indeed a meaningful and useful relaxation. In the process, we also 
obtain the first factoring-based dense cryptosystemsH] When combined with enhanced 
trapdoor permutations, this yields the first factoring-based non-interactive proofs of 
knowledge [DP92). 


Oblivious Transfer and MPC. We consider the applications of our main result to the 
constructions of adaptively secure oblivious transfer and general MPC protocols in both 
the stand-alone setting and the UC setting (c.f. [CELOSO [Ps08] [CbDSMWO09]). 


? These are PKE schemes where a random string has a inverse polynomial probability of being 
a valid public key. 
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CDH, RSA —— = simulatable common-domain TDP ——> 2-round NCE < 5 


| 


DDH, LWE ——————> simulatable PRE ———————~ 3-round NCE 
| 
| 


y 
factoring BI — — — > trapdoor simulatable PKE J 


Fig. 1. Summary of previous results (solid lines) along with our contributions (dashed lines) 


Theorem 3 (informal). There exists a black-box construction of a 6-round 1-out-of- 
£ oblivious transfer protocol for strings in the Feom-hybrid model in the UC setting 
that is secure against a malicious, adaptive adversary, starting from any trapdoor 
simulatable cryptosystem. 


We add that if the oblivious key generation algorithm in the trapdoor simulatable 
cryptosystem achieves statistical indistinguishability (which is the case for all of the 
afore-mentioned constructions), then we obtain an OT protocol that is secure against a 
computationally unbounded malicious sender. While our OT protocol is not as efficient 
as that in the recent work of Garay, Wichs and Zhou (we incur an additional 
multiplicative overhead that is linear in the security parameter), our protocol along with 
our general framework offers several advantages: 


— In addition to relying on the Fcom functionality and a simulatable PKE (to 
implement non-committing encryption) as in our work, the framework 
requires a so-called enhanced dual-mode cryptosystem. This is a relatively high- 
level CRS-based primitive from augmented with two main additional 
properties: the first has a flavor of oblivious sampling; the second requires that the 
underlying CRS be a common random string (modulo some system parameters) 
and not just a common reference string. This requirement is inherent to their 
framework, since this CRS is generated using a coin-tossing protocol. This latter 
requirement is very restrictive, and the only known construction of an enhanced 
dual-mode cryptosystem is based on the quadratic residuocity assumption. 

— Our protocol immediately handles 1-out-of-¢ OT, whereas only addresses 
1-out-of-2 OT, a limitation inherited from [PyWO8]. 


Combined with [[CLOS02) |IPSO8} |CDSMWO9J, we obtain the following corollaries: 


Corollary 1 (informal). Assuming the existence of trapdoor simulatable cryptosys- 
tems, there exists adaptively secure multi-party protocols in the stand-alone setting and 
in the Feom-hybrid model in the UC setting against a malicious adversary that may 
adaptively corrupt any number of parties. 


Specifically, we obtain the first adaptively secure multi-party protocols based on 
hardness of factoring in both the stand-alone setting and the UC setting with a common 
reference string. 


> Feom is an ideal functionality for commitment. 
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1.2 Additional Related Work 


The problem of constructing encryption schemes that are secure against adaptive 
corruptions was first addressed in the work of Beaver and Haber [BH92]. They 
considered a simpler scenario where the honest parties have the ability to securely 
and completely erase previous states. For instance, an honest sender could erase 
the randomness used for encryption after sending the ciphertext, so that upon being 
corrupted, the adversary only gets to see the corresponding plaintext. An intermediate 
model, wherein we assume secure erasures for either the sender or receiver but not both 
(or, by limiting the adversary to corrupting at most one of the two parties), has been 
considered in several other works [LOO] [CHK05] [KO04]. 


Organization. We present an overview of our constructions in Section preliminaries 
in SectionB] the formulation of a trapdoor simulatable PKE in Section] our factoring- 
based trapdoor simulatable PKE in Section [6] and our non-committing encryption 
scheme in Section B] In Section [J] we show the construction of a 6-round oblivious 
transfer protocol. 


2 Overview of Our Constructions 


At a high level, our non-committing PKE is similar to that from previous works 
[K004]. The receiver generates a collection of public keys in such 
a way that it only knows an a fraction of the corresponding secret keys; this can 
be achieved by generating an a fraction of the public keys using the key generation 
algorithm and the remaining 1 — a fraction obliviously. Similarly, the sender generates 
a collection of ciphertexts in such a way that it only knows an a fraction of the 
corresponding plaintexts. Previous constructions all work with the natural choice of 
a = 1/2 so that the simulator generates a collection of ciphertexts half of which 
are encryptions of 0 and the other half are encryptions of 1. As noted in [KOO4], 
this is sufficient for obtaining non-committing PKEs wherein at most one party is 
corrupted. Roughly speaking, the difficulty in handling simultaneous corruptions of 
both the sender and the receiver with a = 1/2 is that in the simulation, the sender’s 
choice of the a fraction of keys completely determine the receiver’s choice of the 
aq fraction of ciphertexts whereas in an actual honest encryption, these choices are 
completely independent (we elaborate on this later in this section). The key insight 
in our construction is to work with a smaller value of a (turns out 1/4 is good enough). 


A Toy Construction. Consider the following encryption scheme, which is a simplifi- 
cation of that in [KO04] [DNOOJ. The receiver generates a pair of public keys (PKo, PK1) 
by generating one key (selected at random) using the key-generation algorithm, and the 
other using the oblivious sampling algorithm. To encrypt a bit b, the sender generates a 
pair of ciphertexts (Co, C1) as follows: pick a random bit r, set C, to be Encpx,.(b) and 
choose C_, using the oblivious sampling algorithm. To decrypt, the receiver decrypts 
exactly one of Co, C1 using the secret key that it knows. This construction corresponds 
to a = 1/2 where a is the fraction of public keys for which the receiver knows the 
secret key, and also the fraction of ciphertexts for which the sender knows the plaintext. 
Observe that this encryption scheme has the following properties: 
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— It has a constant decryption error of 1/4 if an obliviously sampled ciphertext is 
equally likely to decrypt to 0 or 1. As shown in [KQ04J, this error can be reduced 
by standard repetition techniques. 

— It tolerates corruption of either the sender or the receiver, but not both. Consider a 
simulator that generates both of (PKo, PKi) (along with SKo, SK1) using the key- 
generation algorithm, and a ciphertext (Co, C1) as follows: pick a random bit /3, 
and set Co to be Encpx, (3) and C1 to be Encp,, (1 — 8). Suppose the simulator 
later learns that this is an encryption of 0. If only the sender is corrupted, the 
simulator claims r = ( and that C;_, is obliviously sampled. If only the receiver 
is corrupted, it claims that it knows SKg and that PK,_g is oblivious sampled. 


We highlight two subtleties in the above simulation strategy. First, it achieves 0 
decryption error (as opposed to 1/4 in an honest encryption); this can be fixed with a 
somewhat more involved simulation strategy. This in turn becomes pretty complicated 
once we use standard repetition techniques to reduce the decryption error. Next, it is 
always the case in the simulation that either both PKo and Co are obliviously sampled, 
or both PK; and C; are obliviously sampled. As such, this simulation strategy fails if 
both the sender and the receiver are corrupted, because in an actual encryption, which of 
PKo, PK; and which of Co, C are obliviously sampled are determined independently. 


Our Encryption Scheme. As noted in the introduction, the key insight in our 
construction is to work with a small value of a. In addition, following [DNOO], we 
use a random k-bit encoding of 0 and 1, where k is the security parameter: 


— The receiver generates 4k public keys PK,,...,PK4z: k of them are generated 
using the key-generation algorithm, and the remaining 3k are generated using the 
oblivious sampling algorithm. The receiver then sends PKj,...,PK4, along with 
two random k-bit messages Mo, Mı. 

— To encrypt a bit b, the sender sends 4k ciphertexts (one for each of PK),..., PK4x), 
of which k are encryptions of M», and the remaining ones are obliviously sampled. 

— To decrypt, the receiver decrypts the k ciphertexts for which it knows the 
corresponding secret key. If any of the k plaintexts matches Mo, it outputs 0 and 
otherwise, it outputs 1. 


Encoding 0 and 1 randomly as Mo and M; is useful for two reasons: 


— That an obliviously sampled ciphertext is equally likely to decrypt to 0 or 1 is 
no longer needed to guarantee correctness (c.f. (DNOO|). Indeed, reasoning about 
decryptions of obliviously sampled ciphertext is non-trivial for the lattice-based 
simulatable PKEs in [GPVO8}. 

— Constructing a simulator becomes much easier as we avoid having to generate 
distributions over k independent biased bits conditioned on the majority of the 
bits being 0, say. Generating such distributions arises for instance in 
and is related to the first subtlety associated with the naive simulation strategy. 
In our construction, the simulated ciphertext comprises k encryptions of Mo, k 
encryptions of M, and 2k obliviously generated ciphertexts. Having these extra 2k 
obliviously generated ciphertexts (which is possible because œ < 1/2) is crucial 
for handling simultaneous corruptions of the sender and the receiver. 
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Trapdoor Simulatable PKEs from Factoring. Our factoring-based trapdoor simulat- 
able PKE construction consists of two main steps. First, we modify the Rabin trapdoor 
permutations based on squaring modulo Blum integer so that it remains a permutation 
over any arbitrary integer modulus. This relies on the following number-theoretical 
structural lemma implicit in EA: 


Let N be an arbitrary odd k-bit integer, and let Qy = {a?" (mod N) | a € 
ZX }. Then, the map 4% : x ++ x? defines a permutation over Qy. 


We also provide an efficient algorithm for inverting ~ given the factorization of N. 
Note that the standard algorithm for computing square roots does not guarantee that the 
output lies in Qu. Moreover, the probability that a random square root lies in Qu may 
be exponential small so we cannot repeatedly computing random square roots until we 
find one in Qy; it’s also not clear a-priori how to test membership in Qy even given 
the factorization of N. 

The next step transforms the family of trapdoor permutations w acting on the 
domain Qy into a family of “enhanced” trapdoor permutations with the same domain 
Qy, using an idea from [GOJ Section C.1]. The latter has the property that we can 
obliviously sample a random element y in Qy so that given y along with the coin tosses 
used to sample y, it is infeasible to compute the preimage of y under the permutation 
(note that the naive algorithm for sampling a random element of Qn gives away 
its preimage under 7). We will need the oblivious sampling algorithm for a random 
element in Qy in our oblivious sampling algorithm for random ciphertexts. We will also 
need to realize trapdoor invertibility for the latter, which requires an efficient algorithm 
that given the factorization of N and an element y in Qy, outputs a random 2*’th root 
of yj] Note that iteratively computing random square roots k times does not work: after 
computing the first square root, we may not end up with a 2*~!’th power. 


3 Preliminaries 


If A is a probabilistic polynomial time (hereafter, ppt) algorithm that runs on input zx, 
A(x) denotes the random variable according to the distribution of the output of A on 
input x. We denote by A(a;1r) the output of A on input x and random coins r. To 
simplify the notation, we will often omit quantifying over the distribution for r; it will 
usually be clear from the context when r is not fixed, that it is drawn from the uniform 
distribution over strings of the appropriate length. 

We assume that the reader is familiar with the standard definitions of public-key 
encryption schemes and semantic security (c.f. [GM84] [G04]]). We stress that we allow 
decryption errors that are exponentially small in k: 


4 Tt was shown in that y% defines a permutation over the subgroup On of ZX of odd order, 
and that On contains Qx; turns out On = Qn. While Qn is trivially sampleable, it is not 
clear a-priori how to sample from On. 

> Tf we are given just N and not its factorization, this problem is at least as hard as factoring 
random Blum integers. This is in essence why we only obtain a factoring-based trapdoor 
simulatable PKE and not a simulatable PKE. 
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Definition 1 (encryption scheme). A triple (Gen, Enc, Dec) is an encryption scheme, 
if Gen and Enc are ppt algorithms and Dec is a deterministic polynomial-time algorithm 
such that for every message m € {0,1}* of polynomial length, Pr[Gen(1*) — 
(PK, SK), Encpx(m) — c; Decsg (c) # m] < 2-2), 


Non-committing Encryption. For simplicity, we present the definition of a non- 
committing public-key encryption scheme for single-bit messages: 


Definition 2 (non-committing encryption [CFGN96]). A non-committing (bit) en- 
cryption scheme consists of a tuple (NCGen, NCEnc, NCDec, NCSim) where (NCGen, 
NCEnc, NCDec) is an encryption scheme and NCSim is the simulation algorithm that 
on input 1*, outputs (e, c, a8, 0°, o}, 04) with the following property: for b = 0,1 the 
following distributions are computationally indistinguishable: 


— the joint view of an honest sender and an honest receiver in a normal encryption 
of b: 
{(e, c, 06,08) | (e, d) = NCGen(1*; oc), c = NCEnc,(b; og) } 


— simulated view of an encryption of b: 
{le, c, 08,0?) | NCSim(1*) > (e, c, o2, 08, ol, o})} 


It follows from the definition that a non-committing encryption scheme is also 
semantically secure. 


Encrypting longer messages. Starting with a non-committing bit encryption scheme 
(NCGen, NCEnc, NCDec, NCSim), we may encrypt a longer message of length n by 
generating n independent public keys using NCGen, encrypting each bit of the message 
using a different public key and then concatenating the n ciphertexts. Note that this is 
different from the case of semantically secure encryption, where we may encrypt each 
bit using the same public key. 


4 Trapdoor Simulatable Public Key Encryption 


A €-bit trapdoor simulatable encryption scheme consists of an encryption scheme 
(Gen, Enc, Dec) augmented with (oGen, oRndEnc, rGen, rRndEnc). Here, oGen and 
oRndEnc are the oblivious sampling algorithms for public keys and ciphertexts, and 
rGen and rRndEnc are the respective inverting algorithms}. We require that, for all mes- 
sages m € {0,1}*, the following distributions are computationally indistinguishable: 


{rGen(rg), FRndEnc(r¢, re, m), PK, c | (PK, SK) = Gen(1*; ro), c = Encpx(m; rg)} 
and {f¢, fe, PK, ê | (PK, L) = oGen(1*; ĉa), ê = oRndEncy, (1*; #)} 


It follows from the definition that a trapdoor simulatable encryption scheme is also 
semantically secure. 


ê Existence of such inverting algorithms is called trapdoor invertibility. Compared to the 
simulatable cryptosystem (without trapdoor) defined in [DNOOJ, rGen (resp. rRndEnc) takes 
Ye (resp. (To, Te; m)) as the additional trapdoor information. 
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Encrypting longer messages. We note that if we started only with a trapdoor simulatable 
PKE for single bits, we may encrypt a longer message of length n by generating a single 
public key PK using Gen, and concatenating each of the message encrypted under PK. 


5 Non-Committing Encryption from Weaker Assumptions 


Theorem 4. Suppose there exists a trapdoor simulatable encryption scheme. Then, 
there exists a non-committing encryption scheme as well as a universally composable 
oblivious transfer protocol secure against semi-honest, adaptive adversaries. 


We show how to construct a non-committing bit encryption scheme (NCGen, NCEnc, 
NCDec, NCSim) from a k-bit trapdoor simulatable PKE (Gen, Enc, Dec) (augmented 
with (oGen, oRndEnc, rGen, rRndEnc)). This is sufficient to establish the theorem by 
the connection between encrypting single bits and multiple bits as discussed in Sections 
Bland] Our construction is presented in FiguresPJandB] 


Correctness. We begin by establishing correctness. 


— Assume that the input [c1,...,C4%] to the decryption algorithm is a random 
encryption of 0. Recall that J = {Decsx,(c;) | i € T} and we will output 0 
unless Mo ¢ J. It is easy to see that Pr[Mp ¢ J] < CW) + 2-2) where 
the first summand comes from the probability that S NT = Ø and the second 


Key Generation NCGen(1"): 
1. Pick Mo, Mı at random from {0, 1}F. 
2. Choose a random subset T C [4k] of size k. 
3. Fori = 1,2,...,4k, generate a pair (PK;, SK;) as follows: 


Gen(1*) ificT 


oGen(1") otherwise 


(PKi, SKi) = 


Set e = [Mo, Mi, PKi,..., PKax] and d = [T’,SKi,..., SKax]. 


Encryption NCEncpx (b): 
1. Choose a random subset S C [4k] of size k. 
2. Fori = 1,2,...,4k, generate a ciphertext c; as follows: 


Ence, (Mo) ifie S 
Ci = 

oRndEncpx, (1%) otherwise 
Set c = GE sa ., Cak]. 


Decryption NCDecpx (c): 
1. Compute J = {Decpx, (ci)|i € T}. 
2. If Mo € J, output 0; else, output 1. 


Fig. 2. Non-Committing Encryption Scheme (NCGen, NCEnc, NCDec) 
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Simulation NCSim: 
1. Pick Mo, Mj at random from {0, 1}*, 
2. Picking the sets So, S1, To, Ti: 
— Pick two random subsets So, To of [4k] each of size k. 
— Pick two random subsets S1, T; of [4k]\(SoUTo) such that |Si1N71| = |SoNTo|.- 
3. Generating the keys: for i = 1,2,..., 4k, set 


Gen(1*; r4) ifi € ToU SoU Ti U S1 


oGen(1¥; f$) otherwise 


(PKi, SKi) = 


4. Generating the ciphertext: fori = 1,2,..., 4k, set 


Encex,(Mo;ri) if i € So 
= Encpx, (M1; r4) ifi € Sy 
oRndEncpx, (72) otherwise 
5. Simulating an opening to b: set of = {Tp u%",... ug F} and of = 


b,åk 
..., Ug }, where 


rů ifi € Te 
rGen(ré,) ifi € To U T1 U So U S1 \ Ts 
: otherwise 
ifi € S» 
ifi € Si_p 
otherwise 


Set e = [Mo, M1, PK1,.. . , PK4k], € = [c1,..., cax]. Additionally output 02, 02, o4, ot. 


Fig. 3. Non-Committing Encryption Scheme NCSim 


bounds the probability of a decryption error in the underlying encryption scheme 
(Gen, Enc, Dec). 

— Assume that the input [c1,...,c4%] to the decryption algorithm is a random 
encryption of 1. Recall that J = {Decsx,(c;) | i € T} and we will output 1 
unless Mo € J. To bound Pr[Mp € J], observe that the distribution of J depends 
only on M1, PK1,..., PK4k, T, SKi,...,SKa4, and the coin tosses used to generate 
C1,--+-,C4k, and is therefore independent of the choice of a random Mo. This means 
that for each i € T, the probability that Decsx,(c;) equals Mo is 2~*. Taking a 
union bound, we obtain Pr| Mo € J] < k-27*. 


Security. We need to show that for each b = 0,1, a normal encryption of b and a 
simulated encryption of b are computationally indistinguishable. Note that the view in a 
normal encryption of b contains two sets T, S which we will label as Tp, Sẹ» and we will 
append to the view two sets T_», S;_» that are determined as follows: pick two random 
subsets S1—b, Ti» of [4k]\(S,U7;,) such that |S1NT1| = |SoNTp|; call this distribution 
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Ho. We will also append to the view in a simulated encryption of b the sets Ty, S1—b 
as determined by the experiment NCSim; call this distribution H4;. We will show that 
the augmented distributions Ho and H4;, are computationally indistinguishable in two 
steps: 


Reasoning about the sets. First, we claim that the 4-tuple (So, To, S1, Tı) in the 
augmented distribution Ho and in H4, are identically distributed. If b = 0, this is 
obvious since the distributions are defined in exactly the same way. The case for b = 1 
follows from a symmetry argument, namely that if we switch (So, To) with (S1, T1) in 
the experiment NCSim, we get exactly the same distribution. Henceforth, it suffices to 
argue that Ho and H4;, are computationally indistinguishable, conditioned on some 
fixed (So, To, S1, T1) in both Ho and H4,. We may now WLOG focus on the case 
b = 0. In fact, we may as well also fix Mo, Mı in both Hp and H4,. In addition to 
So, To, 51,71, Mo, Mı, the distributions Ho, H4, comprise: 


— 4k public keys PK1, . . . , PK4x (generated using either Gen or oGen); 
— 4k ciphertexts c1, .. . , ca (generated using either Enc or oRndEnc); 
— 4k sets of coin tosses ui, ..., u4 for generating the public/secret keys; and 


as 
— 4k sets of coin tosses ul,..., ué* for generating the ciphertexts. 


That is, we have 4k tuples of the form (PK;, ci, u$, ut), i = 1,...,4k in each view. 
Since So, To, S1, Tı are fixed, each of these 4k tuples are independently sampled from 
some distribution that only depends on the index 7. Denote by Xj,..., X4% the random 
variables for the 4k tuples in Ho, and Yj, ..., Y4x the random variables for the 4k tuples 
in Hak. 


The hybrid argument. Next, we argue that X; and Y; are computationally indistinguish- 
able for i = 1,...,4k, from which the indistinguishability of Hp and H4% follows via 
a hybrid argument. There are several cases we need to consider: 


- i € To ori € [4k] \ (To U So U Tı U S1). It is easy to verify that in either of these 
cases, X; and Y; are identically distributed. 
— i € Sı (“oGen, oRndEnc = Gen, Enc”). Here, X; is the distribution 


{PK, ¢, fc, Pe | (PK, L) = oGen(fs), ê = oRndEncpk (fe) } 
and Y; is the distribution 
{PK, c, rGen(ro), rRndEnc(r¢, re, M1) | (PK, SK) = Gen (ro), c= Encpx (Mi; rg). 


Indistinguishability follows immediately from the security of the trapdoor simulat- 
able PKE. 
— i € So \ To (“oGen, Enc © Gen, Enc”). Here, X; is the distribution 


{PX, c, fo, Tre | (PX, L) = oGen (fs), c = Enc, (Mo; re)} 
and Y; is the distribution 


{PK, c, rGen(ro), re | (PK, SK) = Gen(ra), c = Encpk (Mo; re) }. 
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Indistinguishability follows again from the security of the trapdoor simulatable 
PKE. 
— i € Tı \ S1 “oGen, oRndEnc © Gen, oRndEnc’”). Here, X; is the distribution 


{PK, ĉ, fo, e | (PK, L) = oGen(#c), ĉ = oRndEncx (fe) } 
and Y; is the distribution 
{PK, ĉ, rGen (ra), fe | (PK, SK) = Gen(ro), € = oRndEncpx (7g) }. 


Indistinguishability follows again from the security of the trapdoor simulatable 
PKE. 


Improving the Efficiency. Instead of using sets S,T C [4k] of size k, we choose 
S,T C [40] of size 10. The previous analysis still goes through, except we now have a 
constant decryption error. To address this problem, we first encode the messagq] with 
a linear-rate error-correcting code that corrects a constant fraction of errors, and then 
encrypt the codeword with the encryption scheme with constant error. 


6 Trapdoor Simulatable PKE from Hardness of Factoring 


Theorem 5. Suppose factoring Blum integers is hard on average, and that Blum 
integers are dense, then there exists a trapdoor simulatable PKE. 


For simplicity, we only present a 1-bit trapdoor simulatable encryption scheme; we may 
encrypt longer messages by encrypting bit by bit. 


A number-theoretic lemma. Fix any k-bit integer modulus N and we will work with 
the group Z%. We will use factor(V) to denote the factorization of N, and we define 
Qn = {a |ae Zx }. Now, consider the map Yy : Qn > Qu given by n(x) = 
x? (mod N). As shown in Facts 3.5-3.7], Wy defines a permutation on Qy. We 
provide a more direct proof which also yields an efficient algorithm to invert wy given 
factor( N). 


Claim. The map wy defines a permutation on Qy. 


Proof. Let q denote the largest odd divisor of ¢( N), where ¢(-) is the Euler’s totient 
function. It is easy to see that (NV) divides 2*q, since N < 2”. Take any y € Qy, 
where y = a?” . Then by Euler’s theorem, y1 = 1 (mod N) and thus Wy(y@t)/2) = 
y (mod N). Clearly, y4t0/? € Qn, so the map wy is surjective. Moreover, the range 
and domain of Yy have equal sizes, so Yy must define a bijection. 


The construction. We sketch the construction here; the formal construction is shown 
in Figure] 


7 The codeword length (or, equivalently the message length) should be 2(k). Then, by Chernoff 
bound, the number of decryption errors remains a constant fraction of the codeword length with 
overwhelming probability. 
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Key generation Gen(1*): 
1. Run Bach’s algorithm using the randomness re to sample random Nj,...,Nz3 € 
{0, 1}* along with their factorization factor(N1),... , factor(N;3). 
2. Set PK = [M1,..., Nps] and SK = [factor( N1), .. . , factor(N;3)]. 
Encryption Enc(b): 
. Parse the randomness rz as (a1,..., 443) € Zn, X: “XEN go? P -T3 € {0,1}" 
and bi,...,6,3_, € {0, 1}. 
. Compute b3 = b@ bi ®--- G bys_y. 
. Compute x; = a” € Qn,,i=1,...,k°. 
. Output [ry; (xi), ri, (£i ri) @ bii = 1,..., k]. 
Decryption Dec(c): 
1. Parse cas [yi, ri, Gi,i = 1,...,k°). 
2. Compute b; = (my: (ys) -ri) Ð bi i= l,..., kè. 
3. Output bi ® --- ® bys. 
Oblivious key generation oGen(1*): 
1. Parse the randomness fs € {0, es as Ni,...,Nys € {0, 1)”. 
2. Output (N1, ..., N3). 
Trapdoor invertibility key generation rGen (re): 
1. Run Gen(rg) to obtain fs = (Ni,..., Ng). 
2. Output fo. 
Oblivious sampling of ciphertexts ORndEnc(1”): 
1. Parse the randomness fs as (71, . . - , Yp3) E ZN, X+: XZN 55 81,..., 8,43 € {0,1}* 
and 31,..., Gps E {0,1}. 
2. Compute yi = 42" € Qn,,t=1,...,k°. 
3. Output [yi, si, Bi, i =1,...,k°). 
Trapdoor invertibility for ciphertexts rRndEnc(re, re, b): 
1. Use ro to compute factor(N1),..., factor( Nx). and parse rg as in Enc. 
. Set si = r; and ĝ; = (zi: ri) ® bi, i = losak 


2 
3. Pick a random y; uniformly from the set {yi € ZÑ, | y2" = Ty; (vi) }. 
4. Output fs = (v1, poag E Silas a iy 843, 1; bene , bk3). 


Fig. 4. Trapdoor Simulatable PKE from hardness of factoring Blum integers 


STEP 1: First, we construct a family of “weakly one-way” enhanced trapdoor 
permutations. We start by modifying Yy to aaa a new family of permutations 
Ty; the modification is analogous to that in Section C.1] to obtain enhanced 
trapdoor permutations from Rabin’s eG permutations. The permutations 7y : 
Qn — Qn are indexed by a k-bit integer N and is given by: 


k+1 


nyha) = yk (£) =x (mod N) 


and the trapdoor is factor( N). We may sample from this family by running Bach’s 
algorithm to pick a random k-bit integer along with its factorization. 

It is easy to verify my is a family of trapdoor permutations. Clearly, my is a 
permutation because it is the (k + 1)-fold iterate of a permutation Yy. Given the 
index N, my is efficiently computable by repeated squaring. Given the trapdoor 
factor(N), my” is efficiently computable given factor(N), by simply mapping y 
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to y((a+)/2)""" ie., raising y to the (q + 1)/2’th power k + 1 times. Here, q 
denotes the largest odd divisor of ¢( NV), which is easy to compute with the trapdoor. 
Moreover, we can show that if N is a Blum integer (which occurs with probability 
Q(1/k?) [GM04] [RS94)), then inverting my given N is at least as hard as factoring 
N. This implies that 7y is one-way with probability 2(1/k?) over the choice of N. 

STEP 2: Construct a “weak” encryption scheme using the standard construction of 
PKE from trapdoor permutations via the Goldreich-Levin hard-core predicate. The 
public key is N, the secret key is factor( N), and to encrypt a bit b, we pick a 
random z € Qwn,r € {0,1}* and output (ry (£), r, (x - r) © b), where x - r is 
the standard dot-product of k-bit strings. Again, this scheme will be semantically 
secure with probability 2(1/k?) over the choice of N. 

STEP 3: To boost the security of the “weak” encryption scheme, we define a new 
scheme where the public key is k random k-bit strings Ni,...,N,3 (with 
overwhelming probability, one of these is a Blum integer), and to encrypt a bit 
b, we pick random b1,...,bg3 such that b = bı ® --- - bys and concatenate 
the encryptions of b,..., 6,3 under the respective public keys N1, ..., Ngs. By 
a standard argument (c.f. [DP92}}), this encryption scheme is semantically 
secure in the standard sense. 


Analysis. Indeed, we claim something stronger — that the encryption scheme derived 
in Step 3 is a trapdoor simulatable PKE. 


— (Oblivious sampling & trapdoor invertibility for key generation) This is trivial, 
since a random public key corresponds to a string in {0,1}4*. We can clearly 
sample such a public key without learning the secret key. 

— (Oblivious sampling & trapdoor invertibility for random ciphertext) For simplicity, 
we present the algorithms for sampling random ciphertext for the scheme obtained 
in Step 2. Here, sampling is easy: on input the public key N, pick y € Zy,,8 € 
{0,1}*,@ € {0,1}) and output (7?", s, B). To implement reverse sampling, we 
need an efficient algorithm that given factor( N) and x € Qx, output a random 
element of the set {y € Z4, | 2° = ny (x) = x?"* }. This can be accomplished 
as follows: pick a random 7 € Z%, and output x? - n/(n) (7+ ))/2)", where q is as 
before the largest odd divisor of ¢( N). This works because n/(n2 )at9)/2)" will 
be a random 2*’th root of 1 (mod N). 


For the actual proof of security, we will need to show that if NV is a random Blum integer, 
then the following distributions are computationally indistinguishable for every b: 


{(N, 7, tw(a),7, (ar) © b)} and {(N, 7,77", r, B)} 


The first distribution corresponds to an encryption of b using modulus N and random- 
ness (x,r) along with y the output of rRndEnc (a random solution to the equation 
y2" = my(a)). The second corresponds to an obliviously generated ciphertext along 
with the randomness. If there exists an efficient distinguisher, then there exists an 
efficient procedure A that on input N, y, outputs Ty (7?" ) with noticeable probability. 
Since squaring is a bijection on quadratic residues modulo Blum integers, the output 
of A is also the 4th root of y?. We may then use a reduction in [G04] Section C.1] to 


derive from A an algorithm for factoring N with noticeable probability. 
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7 Oblivious Transfer and MPC 


We describe the construction underlying TheoremB] which proceeds in two steps: 


STEP 1: We begin with the [CLOSO2] construction of a semi-honest OT protocol as 
applied to our non- eet are aaa scheme, and observe that the protocol is 
secure against malicious senders. For that, we just need to show how to extract the 
sender’s input when the receiver is honest. In this case, the simulator will generate 
the public keys sent by the receiver in the first message along with the secret keys, 
so that it can then extract the malicious sender’s input by decrypting. 

STEP 2: Next, we apply the compiler in to “boost” the security guarantee 
from tolerating semi-honest receivers to tolerating malicious receivers. (Note that 
we will not need to apply OT reversal as in [CDSMWO9]..) 


Acknowledgements. We thank Ran Canetti, Yuval Ishai, Jonathan Katz, and Chris 
Peikert for helpful discussions and clarifications. 
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Abstract. We give a construction of non-malleable statistically hiding 
commitments based on the existence of one-way functions. Our construc- 
tion employs statistically hiding commitment schemes recently proposed 
by Haitner and Reingold [I], and special-sound WI proofs. Our proof of 
security relies on the message scheduling technique introduced by Dolev, 
Dwork and Naor B], and requires only the use of black-box techniques. 


1 Introduction 


A commitment scheme is an interactive protocol between two parties, the com- 
mitter, who holds a value, and the receiver. It usually consists of two phases: the 
commit phase and the reveal phase. During the commit phase, the committer 
puts a value in a “locked box” and sends it to the receiver. In the reveal phase, 
the committer sends the “key” to the receiver, then the receiver opens the box 
and retrieves the value. Two basic properties of a commitment scheme are the 
hiding property (the receiver cannot learn the committed value before the reveal 
phase) and the binding property (the committer is bounded to one value after 
the commit phase). There are two fundamental types of commitment schemes, 
statistical hiding and statistical binding. In this work, we focus mainly on sta- 
tistically hiding commitment schemes, where the hiding property holds against 
unbounded receivers while the binding property is required to hold only against 
polynomially bounded senders. 

The concept of non-malleability was first introduced by Dolev et al. Ø]. 
The basic properties of commitment schemes cannot prevent malleable attacks 
mounted by a man-in-the-middle adversary who has full control of the commu- 
nication channel between the committer and the receiver. Loosely speaking, a 
commitment scheme is non-malleable if one cannot transform the commitment 
of a value into a commitment of a related value. This kind of non-malleability 
is called non-malleability with respect to commitment B]. The notion of non- 
malleability used by Di Crescenzo et al. [4] is called non-malleability with respect 
to opening, i.e., the adversary cannot construct a commitment from a given one, 
such that after having seen the opening of the original commitment, the adver- 
sary is able to correctly open his commitment with a related value. In the rest 
of this paper, when we say non-malleability, we actually mean non-malleability 
with respect to opening. 


M. Matsui (Ed.): ASIACRYPT 2009, LNCS 5912, pp. 303-618, 2009. 
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1.1 Related Work 


Statistically hiding commitment schemes were first shown to exist based on 
number-theoretic assumptions Io], or more generally, based on any collec- 
tion of claw-free permutations [7] with an efficiently-recognizable index set [8]. 
Subsequent work on constructing statistically hiding commitment schemes are 
based on collision-resistant hash functions [9], or based on any one-way permu- 
tation [I0], or based on regular one-way functions [I]. Nguyen et al. and 
Haitner and Reingold [I] made fundamental progress by constructing statisti- 
cally hiding commitment schemes based on the minimal cryptographic assump- 
tion that one-way functions exist. 

Based on number-theoretic assumptions, non-malleable statistically hiding 
commitment schemes were designed in assuming the existence of a common 
reference string that is shared by the two players before the protocol execution. 
Thus, their schemes do not work in the plain model (i.e., without setup assump- 
tions). More recently, Pass and Rosen [A constructed a non-malleable commit- 
ment scheme that was statistically hiding based on a family of collision-resistant 
hash functions. Their scheme is round-efficient and needs only constant-round 
communication. However, the security proof relies on non-black-box techniques 
and is not efficient. 

As one of the central goals of cryptography is to reduce complexity assumptions 
for various cryptographic primitives and construct them under more standard as- 
sumptions, there remain open questions as to whether or not non-malleable statis- 
tically hiding commitment can be based solely on the existence of one-way functions, 
and be shown secure relying only on black-box techniques. 


1.2 Our Result 


In this paper, we give affirmative answers to both of the questions posed above. 
We show that the existence of one-way function is a sufficient condition for the 
existence of non-malleable statistically hiding commitment. 


Theorem 1. If one-way functions exist, then there exists a non-malleable sta- 
tistically hiding commitment scheme. 


Our commitment scheme uses the commitment scheme [I] to commit to the 
desired value, but modify the opening process by adding a “trapdoor” that can 
be extracted and used by the simulator to cheat in the reveal phase, and would 
not be known to the committer in a real execution. Although the extraction 
requires rewinding, we rely on the message scheduling technique of Lin et al. [5], 
which is a slight modification of the message scheduling technique introduced 
by Dolev et al. B], to show this will suffice to prove the non-malleability. Our 
proof requires only standard black-box techniques. As a tradeoff, however, our 
protocol needs polynomial rounds of interaction. 

The preliminaries and definitions are illustrated in section B] Our non-malleable 
statistically hiding commitment scheme is shown in section B] 
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2 Preliminaries and Definitions 


For any NP languages L, note that there is a natural witness relation R, contain- 
ing pairs (x, w) where w is the witness for the membership of x in L. A function 
u(-), where u : N — [0,1] is called negligible if for every positive polynomial 
p(-), for all sufficiently large n € N, p(n) < TOR A probability ensemble is a 
sequence X = {X;}ier of random variables, where I is a countable index set 
and X; is a random variable ranging over {0,1}?'l) for some polynomial p(-). 
Two probability ensembles X = {X;}ie, and Y = {Yi jicr are computationally 
indistinguishable, if no probabilistic polynomial-time (PPT) algorithm distin- 
guishes between them with more than negligible probability. For page limited, 
we assume the readers are family with interactive proofs. 


Special-sound proofs. A 3-round public-coin interactive proof for the language 
L € NP with witness relation R, is special-sound with respect to R_, if for any 
two accepting transcripts (a, 8, y) and (a’, 6’, y’) for some statement x € L, such 
that a =a! and 3 Æ p', a witness w such that (x, w) € RL can be computed by 
a polynomial-time deterministic procedure. 


2.1 Witness Indistinguishability 


The concept of witness indistinguishability was proposed by Feige and Shamir 
[16]. An interactive proof system is witness indistinguishable (WI) if the verifier 
cannot tell which of the witnesses is being used by the prover to carry out the 
proof, even if the verifier knows both witnesses. We focus on NP languages L 
with a corresponding witness relation R. The readers are referred to for 
formal definition. 

Special-sound WI proofs for NP languages can be based on the existence 
of non-interactive commitment schemes. Assuming only one-way functions, 4- 
round special-sound WI proofs for NP languages exist H More precisely, there is 
a 3-round special-sound WI proof for the language of Hamiltonian Graphs [7% 
assuming one-way permutation families exist. If the commitment scheme used 
by the protocol [I7] is replaced by Naor’s commitment scheme [I3], then it 
becomes a 4-round special-sound WI proof while the assumption is reduced to 
the existence of one-way functions. For simplicity, we use 3-round special-sound 
WI proofs in our protocol though our proof works also with 4-round special- 
sound WI proofs. 


2.2 Commitment Schemes 
In this work, we consider statistically hiding commitment schemes. 


Definition 1 (Commitment Scheme). A pair of PPT interactive machines 
(C, R) is said to be a commitment scheme if the following two properties hold: 


1 A 4-round protocol is special sound if there exits polynomial-time deterministic 
procedure to extract the witness from any two accepting transcripts (7, a, 3, y) and 
(r',a, B, Y) such that T = 7',a=a' and BF p’. 
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Statistical hiding: For every unbounded interactive Turing machine R*, it holds 


that the ensemble ESTON nena nen sete and the ensemble 


{stait py (v22) eosin im nen zea have negligible statistical difference] 
where sta‘ py (v,z) denotes the random variable describing the output of R* 
after receiving a commitment to v using (C, R). 

Computational binding: A malicious (expected) PPT committer S* can suc- 


ceed in opening a given commitment in two different ways only with negligible 
probability. The reader is referred to [JI for more details. 


2.3 Non-malleable Commitments 


As stated in [J], we formalize the notion of non-malleability by a comparision 
between a man-in-the-middle execution and a simulated execution. Just as RELI, 
we consider a tag-based variant of non-malleability. 

Let (C, R) be a commitment scheme. Let n € N be a security parameter. 
Let R € {0,1}”" x {0,1}” be a polynomial-time computable valid relation 
(i.e., for all v € {0,1}", R(v, L) = 0.). In the man-in-the-middle execution, 
the adversary A is simultaneously participating in a left and right interaction. 
In the left interaction, the man-in-the-middle adversary A interacts with the 
committer C to receive a commitment to a value v using tag tag. In the right 
interaction, A interacts with the receiver R and tries to commit to a related value 
using tag of its choice tag. After commit phase execution in both interactions, 
A receives decommitment keys from C and then generates the corresponding 
decommitment key for v. Prior to the interaction, the value v is given to C as local 
input. A receives an auxiliary input z, which might contain a priori information 
about v. If the right commitment or decommitment fails, or tag = tag, 0 is 
set to =L. Let the boolean random variable MIM en (R, v, z) denote whether A 
succeeds. Note mimeven(R, v,z) = 1 if and only if A decommits to a value ù such 
that R(v, 0) = 1. 

In the simulated execution, a simulator § directly interacts with honest re- 
ceiver R. As in the man-in-the-middle execution, the value v is chosen prior 
to the interaction, and S receives some a prior information about v as part of 
its auxiliary input z. S also receives tag tag. S first executes the commitment 
scheme with R. Once the commitment phase has been completed, S receives the 
value v and attempts to decommit to a value ŭ with tag tag. If tag = tag, 0 
is set to L. Let the boolean random variable SIMS en (R, v, z) denote whether S 
succeeds. Note SiM pen (R, v, z) = 1 if and only if S decommits to a value ŭ such 
that R(v, 6) = 1. 


Definition 2 (Non-malleable Commitment [[4]). A commitment scheme 
(C, R) is said to be non-malleable with respect to opening if for every PPT 
man-in-the-middle adversary A, there exists an expected PPT simulator S and a 
negligible function u : N —> [0,1], such that for every polynomial-time computable 


2 The statistical difference between two ensembles {Xi}ier and {Y¥i}ier is defined by 
1. ys, |Pr[X: = a] — Pr[¥; = al]. 
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valid relation R C {0,1}" x {0,1}", for all tags of polynomial length, for every 
v € {0,1}”" and every z € {0,1}*, the following holds: 

Pr[mim en (R, v, z) = 1] < Pr[sim$ en (R, v, 2) = 1] + u(n) 
A commitment scheme that is non-malleable according to Definition Plis liberal 
non-malleable rather than strict non-malleable BB]. Note we follow [A in that 
non-malleability is guaranteed only if the commit phase and the reveal phase do 
not overlap. 


3 Construction 


We begin by presenting a high-level overview of our protocol. Our protocolis based 
on the statistically hiding commitment scheme [I] while relying on the messages 
scheduling technique which is a slight modification of the message schedul- 
ing technique of [2]. The commit phase of our protocol is the same as that of 
the commitment protocol in [I]. The reveal phase, however, comes in two parts. 
Roughly, the reveal phase employs the two-witness technique by Feige and the 
well known FLS-technique PJ]. First, the receiver proves that it knows one of the 
preimages of either element so or element sı computed by itself in the domain of a 
one-way function. Then, the committer sends the committed value v and proves it 
knows how to open the commitment or one of the preimages of either element so 
or element s1. The proofs used by the prover and the verifier are all tag-based WI 
proofs elaborately scheduled as [I5]. For simplicity of exposition, our description 
relies on the existence of one-way functions with efficiently recognizable range 
We also assume the one-way function is length-preserving. Since any one-way func- 
tion can be transformed into length-preserving one-way function [9]. 


3.1 Tag-Based Witness-Indistinguishable Proof 


First, we propose a tag-based WI proof for every NP language L which is used as 
a basic tool in the final commitment scheme. The length of the tag is polynomial 
bounded to the length of the security parameter n. Denote the polynomial by 
t(-). In Fig. J both designo and design, contain two executions of special-sound 
WI proofs for L but with elaborately designed scheduling. The tag-based WI 
proof (Prag, Viag) for L is shown in Fig. 2] The protocol is composed of 4t-round 
special-sound WI proofs for language L. More precisely, there are t rounds, where 
in round j, the schedule designtag, is followed by designi—tag,. The properties of 
(Prag, Viag) are easy to verify. The details are omitted. 

One basic technique in proving the security of most zero-knowledge and com- 
mitment protocols is standard rewinding. However, the rewinding technique is 
problematic when extending to concurrent (here one-left one-right) execution en- 
vironment as an adversary may adaptively schedule its messages that withstand 
any targeted simulator (i.e., the simulator may run super-polynomial time or is 


3 The protocol can be easily modified to work with arbitrary one-way function by pro- 
viding a witness hiding proof that an element is in the range of the one-way function. 
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Qı a2 
. pı ay Protocol (Prag, Vtag) 
ai pı Security Parameter: 1” 
a2 ‘ yı Common Input: An instance x € {0,1}” 
eens | Tag string: tag € {0,1} ™ 
mi a eed For j = 1 to t(n) 
es oe oe PS ae P & V: Execute designtag; 


(a) designo (b) designi 


Execute designi—tag, 


Fig. 1. Two schedules Fig. 2. Tag-based WI proof (Piag, 
Viag) 


exposed to malleability attack.). Considering the non-malleability property for 
commitment schemes, the pivot is to design the stand-alone simulator that sat- 
isfying Definition B] Here we also come up with the problem of how to simulate 
when the adversary adaptively schedules its messages. 

The scheduling in Fig. [which is identical to is vital in achieving the non- 
malleability. The main advantage of this scheduling is that for the proof given 
by a man-in-the-middle adversary, there exists a point at which the adversary 
cannot answer the challenge from the verifier by simply modifying the proof on 
the other side (provided the tag of the proof is different from that of the proof 
on the other side.). 

Related to the above scheduling is a notion called safe-point, from which it is 
possible to perform extraction by standard rewinding until we obtain a second 
proof transcript, without “affecting” the other side interaction. Below is the 
formal definition of safe-point, which is mainly taken from and abridged to 
our setting. 


Definition 3 (Safe-point [I5]). A prefix p of a transcript r is called asafe-point, 
if there exists an accepting proof (ay, Br, Yr) in the right interaction, such that 


1. a, occurs in p, but not B, (and Ņr). 
2. For any proof (ax, 1,71) in the left interaction, if only a; occurs in p, then 
bı occurs after yr. 


When protocol (Prag, Viag) is run concurrently, it is guaranteed there is a safe- 
point for right interaction that has a tag different from the left interaction fol- 
lowing from the next lemma. 


Lemma 1 (Safe-point Lemma (rs). In any one-one man-in-the-middle ex- 
ecution of (Prag, Viag), if the right interaction has a different tag from the tag of 
the left interaction, there exists a safe-point for the right interaction. 


* The safe-point lemma in applies to any one-many concurrent execution environ- 
ment, where the adversary participates in one left interaction and polynomial many 
right interactions. Here we use a simpler version of the safe-point lemma, where the 
adversary participates in one left interaction and one right interaction. 
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3.2 Non-malleable Statistically Hiding Commitment Scheme 


Let (SHC, SHR) be the statistically hiding commitment scheme[]] from any one- 
way function Ë and let (Prag, Veag) be a tag-based WI proof for NP. The commit- 
ment protocol is shown in Fig. B] The length of the tag is m(n). Our construction 
in fact compiles any statistically hiding commitment scheme with non-interactive 
reveal phase into a non-malleable statistically hiding one with interactive reveal 
phase, assuming the existence of one-way functions. 


Protocol (C, R} 
Security Parameter: 1” 
Tag string: tag € {0, 1}™™ 
String to be committed: v € {0, 1}” 
Commit Phase: 
C = R: Run the commit phase of commitment scheme (SHC, SHR}, where C 
runs SHC and R runs SHR. 
R : Abort if the above commit phase fails. 
Let com be the transcripts of messages obtained. C records the decommitment 
key in dec. 
Reveal Phase: 
Stage 1: 
R — © : Pick uniformly ro,rı € {0,1}", compute so = f(ro) and sı = f(ri) 
and send so, s1. 
R & C : R and C engage in an execution of (Prag, Veag) with tag tag, where 
R uses r, as witness (b € {0,1}) and runs Pyag to prove to C (running Vag) 
knowledge of a value r s.t. so = f(r) or sı = f(r). The challenge length of the 
verifier (i.e., C) is 2n. 
C: Abort if either so or sı is not in the range of f or the proof fails. 
Stage 2: C — R : Send v. 
Stage 3: 
C & R: C and R engage in an execution of (Prag, Veag) with tag tag, where C 
runs Prag to prove to R (running Veag) that there exists a value dec s.t. dec is 
the valid decommitment key of com corresponding to v or there exists a value 
r s.t. so = f(r) or sı = f(r). The challenge length of the verifier (i.e, R) is 2n. 


Fig. 3. Non-malleable statistically hiding commitment scheme (C, R) 


Theorem 2. Suppose that (SHC,SHR) is a statistically hiding commitment 
scheme with non-interactive reveal phase and (Prag, Viag) is a tag-based WI proof. 
Then (C, R) is a non-malleable statistically hiding commitment scheme. 


Remark 1. The commitment scheme shown in Fig. Blis tag-based non-malleable. 
Compared with existing tag-based commitment schemes [2)15)22), it seems a bit 


5 Note the commitment scheme [I] is only for a single bit. By running their scheme in 
parallel, we obtain a commitment scheme of any polynomial length. Hence, we also 
assume that the basic statistically hiding commitment scheme is for a string. 
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strange that our construction uses tags only in the reveal phase. In fact, this 
approach is inspired by the work of [MIS]. Even tag-based non-malleable com- 
mitments can be transformed into content-based non-malleable commitments in 
a standard way [2], we explicitly present one in Appendix A] for reference. 


Remark 2. The high level approach of our commitment scheme is to combine [A 
with B5]. That is, to commit to v, in the commit phase, a sender commits v 
using the statistically hiding commitment scheme [I], and in the reveal phase, 
a sender sends v and proves using a “simulation-extractable” argument 
that the commit phase transcript opens to v. The simulation strategy at a high 
level is from [IA]. For technical reasons, naively using the simulation-extractable 
arguments from does not work. We need to modify the opening process by 
adding a “trapdoor” that can be extracted and used by the simulator to cheat 
in the reveal phase. This is the reason why we add one more phase (i.e., Stage 
1). Whereas in BIS], the trapdoor is only used in the hybrid experiment for 
analysis and may therefore hard-wired via a different analysis. 


Proof (sketch). We need to prove the scheme satisfies the following three proper- 
ties: statistical hiding, computational binding and non-malleability with respect 
to opening. We start by proving the hiding and non-malleability properties and 
then return to the proof of the binding property. 


Statistical hiding. The hiding property follows directly from the hiding property 
of the commitment scheme (SHC,SHR). Note that (SHC,SHR) is statistically 
hiding, and so (C, R) is also statistically hiding. 


Non-malleability. We show that for every PPT man-in-the-middle adversary 
A, there exists a probabilistic expected polynomial-time simulator S and a 
negligible function u such that for every polynomial-time computable relation 
R C {0,1}" x {0,1}”, for every tag tag of length m(n), for every v € {0,1}" 
and every z € {0,1}*, it holds that 

Pr[mim hen (R, v, z) = 1] < Pr[sim$ en (R, v, 2) = 1] + u(n) (1) 


open 


Denote by Arey the state of A after the the commit phase, i.e., Arey contains A’s 
description along with its configuration at that time just before the reveal phase 
starts. 

We proceed to describing the simulator S. S on input z and security parameter 
1” interacts with an honest receiver R and runs the adversary A internally. Dur- 
ing the commit phase, on a high level, S internally incorporates A and emulates 
the commit phase of the left execution for adversary A by honestly commit- 
ting to 0”, while externally relaying messages in the right execution between A 
and R. 

Once the commit phase is finished, S receives a value v and has to perform 
the reveal phase internally with Arev. In Stage 1, S plays as an honest sender in 
the left reveal phase and as an honest receiver in the right reveal phase. Once 
the simulation of Stage 1 completes, S applies the safe-point lemma to find a 
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safe-point and extract a witness w to the statement proved by Arev in the left 
reveal phase by standard rewinding In Stage 2, S just sends v to Arey in the 
left reveal phase. Then the simulation for Stage 3 begins. S uses a fake witness 
(i.e. the trapdoor w) to simulate the left interaction for Arev, while emulating 
the right interaction as an honest receiver. When the simulation for Stage 3 
completes, S again applies the safe-point lemma to find a safe-point and ex- 
tract a witness Ù (i.e., the decommitment keys of A) in the right interaction. 
Finally, by using w, S can complete the reveal phase of the external execution 


with R. 


More formally, S proceeds as follows on auxiliary input z and tag tag: 


1. S internally incorporates A(z). 
2. During the commit phase S$ proceeds as follows: 
(a) S internally emulates left interaction for A by honestly committing to 


OQ”. 


(b) Messages from right execution are forwarded externally to R. 

3. Once the commit phase has finished, S receives the value v. Let com, com 
denote the left and right execution transcripts respectively. 

4. During the reveal phase S' internally incorporates Arey and proceeds as fol- 


lows: 


(a) Stage 1 Main Execution Phase: S$ emulates a one-one man-in-the- 
middle execution by playing as honest sender with tag tag on the left 
and as honest receiver on the right. After completing the execution, 
denote by A the transcripts of messages obtained. Denote the right 
tag by tag. We emphasize here that S can emulate left interaction 
independent of v in Stage 1. 

Stage 1 Rewinding Phase: Next, S attempts to extract the witness 
used by Arev on the left if tag 4 tag. 


i; 


il. 


iil. 


In A, find the first point p that is a safe-point. Let the associated 


proof be (ap, Bp, Yp). 

Repeat until a second proof transcript (ap, 8%, Y4) is obtained: 

Emulate the left interaction as in the Stage 1 Main Execution 

phase. For the right interaction: 

— If Arey expects to get a new proof from the right receiver, S 
then emulates the proof by generating designo himself. Forward 
one of the two proofs internally. 

— If Av sends a challenge for a proof whose first message occurs 
in p: cancel the execution, rewind to p and continue. 

If 6, # Bp, extract and record the witness w from (ap, bp, Yp) 

and (ap, 8p, Yp). Otherwise halt and output fail. 


7 


Finally, if the above (i.e. step Ha) runs for more than 2” steps, halt and 
output fail. 


6 In Stage 1, the committer acts as a prover and the receiver acts as a verifier. The 
safe-point and safe-point lemma still work by interchanging right and left. 
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(b) Stage 2: Send v to the adversary Arey. 
(c) Stage 3 Main Execution Phase: By using w as witness, S can easily 
simulate left interaction for Arev. The right interaction is emulated by 

S adopting honest receiver strategy. After completing the execution, 

denote by A’ the transcripts of messages obtained in the execution 

of Stage 2 and Stage 3. 

Stage 3 Rewinding Phase: S attempts to extract the decommitment 
key of Arev on the right: 
i. In J’, find the first point p that is a safe-point. Let the associated 
proof be (Gp, Bp, Ya): 7 

ii. Repeat until a second proof transcript (4, b5, V5) is obtained: 
Emulate the right interaction as in the Stage 3 Main Execution 
Phase. For the left interaction: 

— If Arey expects to get a new proof from the committer, S is free 
to answer the request by using the witness w, except when Arev 
sends a challenge for a proof whose first message occurs in ñ, 
S cancels the execution, rewinds to p and continues. | 

iii. If 65 # bh, extract a witness Ù from (a, 35, p) and (a, 85, 75). 
Otherwise halt and output fail. 

iv. If ð is a valid decommitment key for (SHC, SHR), i.e., (com, w,v) 
is a legal transcript for (SHC,SHR), set rev = w. Otherwise halt 
and output fail. 

Finally, if the above (step runs for more than 2” steps, halt and 
output fail. 

(d) If the right interaction is accepting and tag 4 tag, and rev contains a 
valid decommitment key, run the honest committer strategy on input 
com and decommitment key rev, value Ù with tag tag. 


Running time of S. We show that the running time of S is expected PPT. 
Note the time spent by S$ in the commit phase is poly(n). After S extracts the 
witness Ù, the time spent by S in stepGdlis also poly(n). Next, we show that the 
expected time spent by S in the reveal phase (except running time in step Ad) is 
also poly(n). For simplicity, we assume that S does not check the fail condition 
and may run for more than 2” steps (since this only increases the total running 
time). 

Recall that in the reveal phase, S rewinds A from two safe points. We need 
to show the time spent in step Ga] and step Hd are all expected PPT. We first 
analyze the time spent in step Hal during the simulation. Then using the same 
method, we show that the time spent in step Adis also expected PPT. 

Note the time spent by S in the Stage 1 Main Execution Phase is poly(n). We 
then show the time spent in Stage 1 Rewinding Phase is expected PPT. The anal- 
ysis hereafter is similar to that in but is simpler. Let T(z) be the random vari- 
able that describes the time spent in rewinding a proof after 7 messages have been 
exchanged. We show that E[T(i)] < poly(m) and then by linearity of expectation, 
we conclude that the expected time spent by S in the Stage 1 Rewinding Phase 


is $; E[Z'(@)] < DU; poly(n) < poly(n). 
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Next we will bound the time E[T(i)]. Given a partial transcript of messages 
p, let Pr[p] denote the probability that p occurs as a prefix of the execution 
emulated in Stage 1 Main Execution Phase. Let p, denote the probability that p 
is a safe-point] and is rewound. From the construction of S5, we know that S 
keeping rewinding until it finds another accepting transcript (ap, 6p, Yp) for p, 
canceling each rewinding for which p is not a safe-point, i.e., Arey requests the 
second message of a proof in the right-interaction whose first message occurs in p. 
As the emulated committer and receiver act identically as real committer and real 
receiver in this stage, conditioned on p, a view occurring in a rewinding from p is 
same as occurring in the Stage 1 Main Execution Phase. Thus, the probability of 
canceling a rewinding from p is at most 1—p,. Furthermore, the expected number 
of rewindings is at most oe Therefore, the expected number of rewindings from 


p is at most pp - 5 = 1 and each rewinding takes at most poly(n) steps, i.e., 
E[T(i)|p] < poly(n). Thus, 


ETO= XO EÐ]: Pri] < poly(n): $, Prip| < poly(n) 


p of length i p of length i 


The expected running time of S in stepidis also polynomial-time using similar 
analysis as above. We omit the details. 


Analysis of the simulator S. In order to show equation (I), we define a hy- 
brid stand-alone simulator HYB, that also receives v as auxiliary input. HYB, 
proceeds exactly as § except that in the commit phase, instead of feeding A a 
commitment to 0", HYB: feeds A a commitment to v. 

Since both the experiment S and HYB; are efficiently computable, the follow- 
ing claim follows directly from the hiding property of (SHC, SHR). 


Claim 1. There exits some negligible function u such that 


Pr[sim>,e,(R; v, z) = 1] — Pr[simgpen’ (R, v, z) =1]| < u'(n) 


open 
Next we proceed to showing the following claim. 


Claim 2. There exists some negligible function u” such that 


Pr[mim4,,,(R, v, z) = 1] — Pr{simSY® (R, v, z) = 1|-fail]| < u” (n) 


open open 


Proof (sketch). Note the view of A in the commit phase in a real interaction is 
identical to the view of A in HYB,. Furthermore, HYB; feeds A messages ac- 
cording to the correct distribution in Stage 1, the view of Aey in the simulation 


T Note the roles of C and R interchange in Stage 1 where C acts as a verifier and R 
acts as a prover. The safe-point lemma will be used by interchanging the right and 
the left. 
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of Stage 1 by experiment HYB; is identical to the view of Aey in a real interac- 
tion. The view of Arey in the simulation of Stage 3 by HYB; is computationally 
indistinguishable following from the witness-indistinguishability of (Prag, Viag)- 
As the safe-point lemma shows, when the right interaction has a different tag 
from the left interaction, there is a safe-point. Hence, according to the actions 
of HYB,, it will either output fail or succeed in the extraction from Arev. Con- 
ditioned on HYB, not outputting fail, by the computational-binding property of 
(SHC, SHR), except with negligible probability, the witness Ù and the value ð 
extracted by HYB; are the valid decommitment key and committed value of A, 
respectively. 

We next show Pr[simgpen' (R, v,2z) =1)- Pr[sim A (R, v, z) = l|>fail]| is negli- 
gible by proving that the probability that event fail happens is negligible. This 
together with Claim [and Claim Blconclude Eq. ). 


Claim 3. HYB; outputs fail with negligible probability. 


Proof. The proof of this claim is similar to that of [l5]. More precisely, HYB: 
outputs fail only in three cases: HYB; runs for more than 2” steps; or the same 
proof transcript is obtained from some safe-point; or the witness extracted is 
not a valid decommitment. The arguments of the first two cases are almost 
the same as those in [I5]. The main difference lies in the analysis of the third 
case. 


HYB; runs for more than 2” steps: We know that the expected running time 
of HYB; and S are same, i.e., poly(n). Using Markov inequality, we con- 


are that the probability that HYB: runs more than 2” steps is at most 

poly(n) 

The ae proof transcript is obtained from some safe-point: This case 
occurs if HYB; picks some challenge 8 (resp. 3) in Stage 1 (resp. Stage 3) 
Rewinding Phase that appeared as a challenge in the Stage 1 (resp. Stage 
3 ) Main Execution Phase. As HYB, runs for at most 2” steps, it picks 
at most 2” challenges. Furthermore, the length of each challenge is 2n. 
By applying the union bound, we pitan that the probability that a 8 
(resp. B) is picked twice is at most om: Since there are at most polyno- 
mial many challenges in Stage 1 (resp. Stage 3), using union bound again, 
we conclude that the probability that it outputs fail in this case is negligi- 
ble. 

The witness extracted is not a valid decommitment:[] Suppose, on the 
contrary, the witness extracted is not the decommitment key for (SHC, SHR), 
then by the special-sound property, it follows that it must be a value r’ 


8 The proof in this case heavily relies on the “simulation-extractability” property of 
(Prag, Veag) in Stage 1. An ordinary WI proof of knowledge is not suffice here, as 
the problem in this case is reduced to the security of one-way functions or witness- 
indistinguishability of underlying subprotocols, in the presence of an expected PPT 
adversary who can rewind the same subprotocols. 
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such that f(r’) = sy for some b’ € {0,1}. Denote by rẹ (b € {0,1}) the 
witness used by HYB; in Stage 1 of right interaction. If b = 1 — b, then 
we can break the one-way function f. Given A,z and v, we construct an 
algorithm B that inverts f. The input to B is an n-bit string y = f(z) 
where x was chosen randomly from {0,1}”". B wants to output a pre-image 
of y under f. B proceeds as follows: B runs identically as HYB; with in- 
puts z,v with the exception that when simulating the right receiver for A 
in Stage 1 of reveal phase, it picks a random bit b € {0,1} and a ran- 
dom string rẹ € {0,1}", and sets sẹ = f(rv),si-p = y. By using ry as 
witness, it can simulate the right interaction with Arey easily. Finally, if 
B extracts a witness r’ where f(r’) = y, then we break the one-wayness 
of f. The probability that B inverts f is identical to the probability that 
HYB, inverts f which is non-negligible. This contradicts the one-wayness 
of f. 

We therefore have only to deal with the case that B always outputs r’ 
such that f(r’) = s», i.e., B always outputs same preimage it knows. Then 
we can break the witness indistinguishability of the underlying special-sound 
proofs as follows: Recall that the proof (Prag, Viag) in Stage 1 of right interac- 
tion contains 4m number of special-sound WI proofs. The above assumption 
is that B always extracts the same preimage used by itself in Stage 1 of right 
interaction. We know that if the 4m number proofs use rọ, B outputs ro, 
and if the 4m number proofs use rı, B outputs rı. Applying standard hybrid 
arguments, there exists į € [4m], by using ro for the first i — 1 proofs and rı 
for the last 4m — i proofs, the witness used in the i-th special-sound proof is 
the same as that of the witness extracted by B. We can use this session to 
break the witness-indistinguishability of special-sound WI proof. The prob- 
ability we break the witness-indistinguishability property of the underlying 
special-sound proof is E times the probability that HYB; inverts f which 
is non-negligible. This contradicts the witness-indistinguishability property 
of the underlying special-sound proof. 


Computational binding. The binding property intuitively follows from the bind- 
ing property of the underlying commitment scheme (SHC, SHR) and the special- 
sound property (or more precisely proof of knowledge property) of the underlying 
proof in (Prag, Viag). A formal proof proceeds along the lines of the proof of non- 
malleability. More precisely, suppose, there exists an adversary A that can violate 
the binding property of (C, R), then we design an algorithm A’ that violates the 
binding property of (SHC, SHR). A’ incorporates A and relays the commit phase 
messages to an external honest receiver SHR. In the reveal phase, there is no 
need of A’ to simulate the left interaction for A. Note in the non-malleability 
proof, two extraction are executed. Here, we only execute one extraction by 
standard rewinding, and obtain the decommitment key. Using this information, 
A’ can easily complete the reveal phase with SHR. It follows from the witness- 
indistinguishability property of (Prag, Veag) that the probability that A’ breaks 
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the binding property of (SHC, SHR) is negligible close to the probability that A 
breaks the binding property of (C, R). 


Schedule of messages: In the non-malleability proof, the design of S is based on 
an unspecified assumption, i.e., in the reveal phase, Stage 3 on both interactions 
will not start unless the simulations for Stage 1 are completed. Without loss of 
generality, this assumption is reasonable. 

Consider the scenario where the simulation for Stage 1 of the left interaction 
and Stage 3 of the right interaction overlap. The simulation goes well as the 
adversary runs as a prover in Stage 3 of the right interaction, and the rewinding 
of Stage 1 of the left interaction will not “rewind” the Stage 3 of the right inter- 
action (i.e., the adversary can only answer the left challenge by itself, without 
the help from the right interaction). By using the safe-point lemma, the simula- 
tor can still find a safe-point and extract the witness to the statement proved 
by the adversary by standard rewinding. Furthermore, the adversary also runs 
as a prover in Stage 1 of the left interaction, and the rewinding of Stage 3 of 
the right interaction will not “rewind” the Stage 1 of the left interaction. Due 
to a more simpler but similar reason, when the simulation for Stage 3 of the 
left interaction and Stage 1 of the right interaction overlap, the simulator has 
no difficulty and the two extractions also performs well. We take a special note 
of the fact that the safe-point lemma depicts the existence of safe-point in any 
one-one concurrent execution environment, and considers an environment where 
one-side of the interaction is empty as a special case. 
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A A Content-Based Non-malleable Commitment Scheme 


Let (SHC, SHR) be the statistically hiding commitment scheme [I] from any one- 
way function and let (Prag, Veag) be a tag-based WI proof for all NP. Let SS = 
(SG, Sig, SVer) be a secure signature scheme. The content-based non-malleable 
statistically hiding commitment scheme is shown in Fig. K} Due to page limit, 
the formal proof is omitted here. 
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Protocol (C, R) 
Security Parameter: 1” 
String to be committed: v € {0, 1}” 
Commit Phase: 
C & R: Run the commit phase of commitment scheme (SHC, SHR). 
R : Abort if the above commit phase fails. 
Denote the above transcript as com. C records the decommitment key in dec. 
Reveal Phase: 
Stage 1: 
R —> C : Set (pko, sko) — SG(1") and send pko. 
R — C : Pick uniformly ro,rı € {0,1}", compute so = f(ro) and sı = f(r1) 
and send so, s1. 
R & C : R and C engage in an execution of (Poko, Vpko) With tag pko, where 
R uses rp as witness (b € {0,1}) and runs Ppko to prove to C (running Vpko) 
that there exists a value r s.t. so = f(r) or sı = f(r). The challenge length of 
the verifier (i.e., C) is 2n. C aborts if either so or sı is not in the range of f 
or the proof fails. 
R — C : Let tro be the transcript so far. Set oo +— Sig(tro, sko) and send ao. 
C : Abort if Sver(pko, tro, oo) Æ 1. 
Stage 2: C — R : Send v. 
Stage 3: 
C= R: Set (pki, ski) — SG(1”) and send pkı. 
C & R: C and R engage in an execution of (Ppr,,Vpk,) with tag pki, where 
C uses witness dec and runs Pk, to prove to R (running V,,,) that there 
exists a value dec s.t. dec is the decommitment key of com corresponding to v 
or there exists a value r s.t. so = f(r) or sı = f(r). The challenge length of 
the verifier (i.e., R) is 2n. 
C — R: Let tri be the transcript so far. Set 01 <— Sig(tri, ski) and send o1. 
R: Abort if Sver(pki, tri,o1) Æ 1. 


Fig. 4. Non-malleable statistically hiding commitment scheme (C, R) 
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Abstract. Proofs of storage (PoS) are interactive protocols allowing 
a client to verify that a server faithfully stores a file. Previous work 
has shown that proofs of storage can be constructed from any homo- 
morphic linear authenticator (HLA). The latter, roughly speaking, are 
signature/message authentication schemes where ‘tags’ on multiple mes- 
sages can be homomorphically combined to yield a ‘tag’ on any linear 
combination of these messages. 

We provide a framework for building public-key HLAs from any iden- 
tification protocol satisfying certain homomorphic properties. We then 
show how to turn any public-key HLA into a publicly-verifiable PoS with 
communication complexity independent of the file length and supporting 
an unbounded number of verifications. We illustrate the use of our trans- 
formations by applying them to a variant of an identification protocol by 
Shoup, thus obtaining the first unbounded-use PoS based on factoring 
(in the random oracle model). 


1 Introduction 


Advances in networking technology and the rapid accumulation of information 
have fueled a trend toward outsourcing data management to external service 
providers (“servers”). By doing so, organizations can concentrate on their core 
tasks rather than incurring the substantial hardware, software and personnel 
costs involved in maintaining data “in house”. 

Outsourcing storage prompts a number of interesting challenges. One prob- 
lem is to verify that the server continually and faithfully stores the entire file f 
entrusted to it by the client. The server is untrusted in terms of both secu- 
rity and reliability: it might maliciously or accidentally erase the data or place 
it onto temporarily unavailable storage media. This could occur for numerous 
reasons including cost-savings or external pressures (e.g., government censure). 


* Portions of this work done while at Johns Hopkins. 
** Portions of this work done while at IBM. Research supported by NSF grant 
#0426683. 
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The server might also accidentally erase some data and choose not to notify the 
client. Exacerbating the problem (and precluding naive approaches) are factors 
such as limited bandwidth between the client and server, as well as the client’s 
limited resources. See [III] for a more thorough discussion. 

If we allow communication complexity linear in f, there is a simple mechanism 
allowing the client to verify that the server stores f at any given time: When 
the client uploads f, the client locally stores a hash of f; to verify, the server 
simply sends all of f and the client checks that this hashes to the correct value. 
For our purposes, we are interested in solutions with communication complexity 
that is much smaller than (and, ideally, independent of) the file size. 

Ateniese et al. [ and Juels and Kaliski [I] independently introduced ap- 
proaches to this problem having sub-linear communication complexity. (Earlier 
work by Naor and Rothblum is related, but considers a somewhat weaker 
adversarial model.) Ateniese et al. also distinguish between the case of private 
verifiability, where only the original client (or anyone with whom that client 
shares a key) can verify the server’s storage, and public verifiability, where any- 
one knowing the client’s public key can perform verification. Extensions and 
improvements were given by Shacham and Waters [J], Dodis, Vadhan, and 
Wichs [5], and Bowers, Juels, and Oprea [4]. We refer to B] for a more detailed 
comparison among the existing schemes. 

Here, we are interested in publicly-verifiable schemes that can be used for an 
unbounded number of verifications. A useful tool for this, implicit in [I] and 
further studied in [5], is a homomorphic linear authenticator (HLA), which 
can be defined in either the private- or public-key setting. Roughly speaking, 
this primitive allows a client to ‘tag’ each block f; of a file f = fi|---|f, in such 
a way that for any vector c the server can homomorphically construct a (short) 
tag authenticating the value X` c; - fi. 

Two recent works have considered the dynamic setting, where the remotely- 
stored data can be updated [26]. We do not address this problem here. 


1.1 Our Contributions 


The main contribution of this paper is to show a general mechanism (in the ran- 
dom oracle model) for constructing publicly-key HLAs from any identification 
protocol that is suitably homomorphic. The RSA-based HLA used by Ateniese 
et al. [I] (see also [A Appendix E]) can be viewed as an instance of our mech- 
anism applied to the Guillou-Quisquater [IO] identification protocol; similarly, 
the Shacham-Waters scheme [[4] can be seen as being derived from an under- 
lying identification protocol in bilinear groups. By applying our transformation 
to a variant of Shoup’s identification scheme based on factoring [I5], we ob- 
tain the first publicly-verifiable HLA based on factoring (in the random oracle 
model). 

We also show a generic transformation from any HLA to a publicly-verifiable 
proof of storage with communication complexity independent of the file size. This 
transformation is in the standard model, and answers an open question from [14]. 
An analogous transformation with similar properties was shown (independently) 
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by Dodis et al. in the setting of simpler private verifiability; our technique is 
different from theirs and is of independent interest. 

Combining our results, we obtain a publicly-verifiable proof of storage based 
on the factoring assumption in the random oracle model. In our PoS, the com- 
munication complexity and the size of the client’s state are independent] of the 
file size, and the server’s storage is a constant multiple of the file size. In the PoS 
we describe, the computation of both the client and the server is linear in the file 
size, but notice that public-key HLAs can be layered on top of erasure codes (as 
in [14§4)) or used in conjunction with a probabilistic approach for multiple audits 
(as in [I]) to obtain better performance while retaining public verifiability. 


2 Definitions 


We write x — X to represent an element x being sampled uniformly at random 
from a set X. The output y of a randomized algorithm A running on input z is 
denoted by z — A(x). We sometimes write y := A(x;r) to denote the (deter- 
ministic) result of running A on input x and random coins r. We use boldface 
to denote vectors. Given a vector v we let v; denote its ith component. 
Throughout, k € N denotes the security parameter. A function v : N —> R is 
negligible if for every polynomial p(-) and large enough k, we have v(k) < 1/p(k). 


2.1 Homomorphic Linear Authenticators 


Homomorphic linear authenticators (HLAs) were introduced by Ateniese et al. [I] 
as a building block for constructing communication-efficient proofs of storage; 
they were further studied in [5]. At a high level, HLAs are used as follows: 
viewing the file f as an n-dimensional vector, the client begins by tagging each 
element of f and then sending both f and the vector of tags t to the server. To 
verify that the server is storing the entire file, the client sends a random challenge 
vector c and the server returns u = >>, ci; fi along with a tag T, computed using 
f,t, and c, which is supposed to authenticate this value. 

HLAs can be defined both in the private and public-key settings. We give a 
definition for public-key HLAs and refer the reader to for a formalization of 
private-key HLAs. 


Definition 1 (Homomorphic linear authenticator). A public-key homo- 
morphic linear authenticator is a tuple of four PPT algorithms (Gen, Tag, Auth, 
Vrfy) such that: 


(pk, sk) — Gen(1*) is a probabilistic algorithm used to set up the scheme. It 
takes as input the security parameter and outputs a public and private key 
pair (pk, sk). We assume pk defines a k-bit prime p and a positive integer B. 

(t, st) — Tag,,(f) is a probabilistic algorithm that is run by the client in order 
to tag a file. It takes as input a secret key sk and a file f € [B|", and outputs 
a vector of tags t and state information st. 


' The communication complexity for a file of size n is O(logn + k), and as in [5] we 
assume k > logn. 
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T := Authps(f,t,c) is a deterministic algorithm that is run by the server to 
generate a tag. It takes as input a public key pk, a file f € [B]", a tag 
vector t, and a challenge vector c € Zp; it outputs a tag T. 

b := Vrfy,,(st, p, c, T): is a deterministic algorithm that is used to verify a tag. 
It takes as input a public key pk, state information st, an element u € N, 
a challenge vector c € Zp, and a tag T. It outputs a bit, where ‘1’ indicates 
acceptance and ‘0’ indicates rejection. 


For correctness, we require that for all k € N, all (pk, sk) output by Gen(1*), all 


f € [B]”, all (t, st) output by Tag,,(f), and all c € Zi, it holds that 


Vey pk (« Safi: C, auth ft)) Zr 


t 


We remark that in certain schemes correctness (and security) may hold even 
when Vrfy is given only 50, c;f; mod p (assuming B < p). In such cases the 
communication from the server to the client can be further reduced. 

Informally an HLA is secure if, for a given file f and challenge vector c, no 
adversary can output a valid authenticator for an element pu’ Æ J; ci fi. 


Definition 2 (Unforgeability for public-key HLAs). Let A = (Gen, Tag, 
Auth, Vrfy) be a public-key HLA and A be an adversary, and consider the fol- 
lowing experiment: 


1. The challenger computes (pk, sk) — Gen(1"), where pk defines p and B. 

2. Given pk and oracle access to Tag,;,(-), adversary A outputs a file f € [B]”. 

3. The challenger tags the file by computing (t, st) — Tag,,(f). 

4. Given t and st, the adversary A outputs a challenge vector c € Zp, an 
element u’ E€ Z, and a tag 7’. 

5. The adversary succeeds if W #1, cifi and Vrfy,,(st, W, c, T’) = 1. 


A is unforgeable if the success probability of every PPT adversary A in the above 
experiment is negligible. 


The distinctions between the case of public verifiability (as defined above) and 
private verifiability (as defined in [}) are that, in the former setting (1) ver- 
ification does not require the original secret key sk but only the state st and 
the original public key; (2) unforgeability holds even against an adversary who 
knows the public information pk and st. Our definition is also stronger than the 
one given in [5] in that we initially give the adversary access to a tagging oracle. 


2.2 Homomorphic Identification Protocols 


An identification protocol allows a prover P in possession of a secret key sk to 
prove its identity to a verifier V that possesses the corresponding public key pk. 
We consider 3-move identification protocols where the prover generates the first 
message a@ using the public key pk and randomness r; the verifier sends a random 
challenge 8; and the prover then computes a response y using (pk, sk), the 
randomness r, and the verifier’s challenge 3. Given the transcript of the protocol, 
the verifier decides whether to accept or not. 
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Definition 3 (Identification protocol). An identification protocol is a three- 
move protocol between a PPT prover P and a PPT verifier V. The protocol consists 
of four polynomial-time algorithms (Setup, Comm, Resp, Vrfy) such that: 


(pk, sk) — Setup(1*) is a probabilistic algorithm that takes as input the security 
parameter and outputs a public and private key pair (pk, sk). 

a — Comm(pk;1r) is a probabilistic algorithm run by the prover P to generate 
the first message. It takes as input the public key and random coins r, and 
outputs an initial message a. We stress that there is no need for sk. 

y — Resp(pk, sk,r, 6) is a probabilistic algorithm that is run by the prover P 
to generate the third message. It takes as input the public key pk, the secret 
key sk, a random string r, and a challenge B (from some associated challenge 
space), and outputs a response y. 

b := Vrfy(pk, a, 6, y) is a deterministic algorithm run by the verifier V to decide 
whether to accept the interaction. It takes as input the public key pk, an 
initial message a, a challenge 3, and a response y. It outputs a bit b, where 
‘1’ indicates acceptance and ‘0’ indicates rejection. 


For correctness, we require that for all k € N, all (pk, sk) output by Setup(1*), 
all random coins r, and all B in the appropriate challenge space, it holds that 


Vrfy (pk, Comm(pk:; r), 3, Resp(pk, sk, r, B) =1. 


An identification protocol is homomorphic if the verification of several transcripts 
of the protocol can be “batched”: 


Definition 4 (Homomorphic identification protocol). An identification 
protocol X = (Setup, Comm, Resp, Vrfy) is homomorphic if there exist efficient 
functions Combine;, Combine3 such that: 


Completeness: For all (pk,sk) output by Setup(1*) and all c € Zr, if tran- 
scripts { (ai, Bi, Yi) }i<i<n are such that Vrfy(pk, ai, Bi, yi) = 1 for alli, then: 


Vrfy (m Combine: (c, @), X cibi, Combines (c, v) =i 


Unforgeability: Consider the following experiment involving an adversary A: 


1. The challenger computes (pk, sk) — Setup(1") and gives pk to A. 

2. The following is repeated a polynomial number of times: 

— A outputs B’ in the challenge space. The challenger chooses ran- 
dom r, computes y := Resp(pk, sk, 1, 3’), and gives (r,y) to A. 

3. The adversary outputs a n-vector of challenges B. Then for each i the 
challenger chooses r; at random, sets a; := Comm(pk;7r;) and yi := 
Resp(pk, sk, ri, 8i), and gives (r,y) to A. 

4. A outputs a triple (c, W, Y’), where c € Z3x. The adversary succeeds if 
(1) w AY, cB; and (2) Vrfy(pk, Combine; (c, a), w, y) = 1. 
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2.3 Proofs of Storage 


Definition 5 (Proof of storage). A (publicly-verifiable) proof of storage is a 
tuple of five PPT algorithms (Gen, Encode, Prove, Vrfy) such that: 


(pk, sk) — Gen(1*) is a probabilistic algorithm that is run by the client to set 
up the scheme. It takes as input a security parameter, and outputs a public 
and private key pair (pk, sk). We assume pk defines a k-bit prime p and a 
positive integer B. 

(f’, st) — Encodesk(f) is a probabilistic algorithm that is run by the client 
in order to encode the file. It takes as input the secret key sk, and a file 
f € |B|". It outputs an encoded file f’ and state information st. 

m := Prove(pk, f’,c) is a deterministic algorithm that takes as input the public 
key pk, an encoded file f’, and a challenge c € Zp. It outputs a proof n. 

b := Vrfy(pk, st, c, m): is a deterministic algorithm that takes as input the public 
key pk, the state st, a challenge c € Z%, and a proof x. It outputs a bit, 
where ‘1’ indicates acceptance and ‘0’ indicates rejection. 


We require that for all k € N, all (pk, sk) output by Gen(1*), all f € [B]”, all 


(f', st) output by Encodesk( f), and all c € Zi, it holds that 


Vrfy (pk, st, c, Prove(pk, f’, c)) =I, 


Note that the above defines a publicly-verifiable PoS since the original secret key 
sk is not needed in order to perform verification. 

Security of a PoS, roughly speaking, guarantees that if the verifier accepts 
then the prover indeed has (sufficient information to recover) the entire original 
file f. As noted in MOM], soundness can be formalized using the notion of a 
knowledge extractor [ØB]. As in J, we phrase our definition using the paradigm 
of “witness-extended emulation” [2]. 

Definition 6 (Security for a publicly-verifiable PoS). Let IT = (Gen, 
Encode, Prove, Vrfy) be a publicly-verifiable PoS. IT is secure if there is an 
expected polynomial-time knowledge extractor K such that, for any PPT adver- 
sary A we have: 

1. The distributions 


(pk, sk) — Gen(1*); (f, sta) — AfrodeseO (pk); / 
{ (f’, st) — Encode, (f);e — Zy : (c, Alsta, f 3t0)} 


and 
(pk, sk) — Gen(1*); (f, sta) — AEn O (pk); 
(f’, st) — Encodes (f) ` 
are identical. (Above, Kı denotes the first output of K.) 
2. The following is negligible: 
(pk, sk) — Gen(1*); 


(f, sta) ee Abnecodesi(-) (pk); E , 
Pr (F', st) — Encodes, ( f); : Vrfy(pk, st,e,m) =1 NF ¢ f 


(lenh f pe K aS a (pk, st) 


KASHAS Coh, s} 
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3 From Homomorphic Identification Protocols to HLAs 


We now show how to transform any homomorphic identification protocol X = 
(Setup, Comm, Resp, Vrfy) into a public-key HLA. The basic idea is to use the file 
blocks f,..., fn as the “challenges” in n parallel invocations of the identification 
protocol. Thus, a very basic PoS would be as follows: 


— The client computes (pk, sk) — Gen(1*). 

— For each block f; of the file, the client computes a;, yi such that (aj, fi, yi) 
is an accepting transcript in the underlying identification scheme. 

— The client sends to the server the file f = fi|---|f, and the tags 71,.--, Yn; 
the client stores @1,...,@n as its own local state. 


To verify that the server stores the ith block of the file, the client requests the 
server to send (fi, yi); the client can authenticate this response by checking that 
(ai, fi, yi) is an accepting transcript. 

There are several drawbacks to the above approach. First, the client’s state is 
linear in the file size This is easy to remedy by having the client generate each 
a; using a pseudorandom function (if private verifiability suffices) or a random 
oracle (if public verifiability is desired, as here). A more serious problem is that 
a server can easily “cheat” without being caught “too often” by throwing away 
blocks of the file. If the server deletes, say, 1 block from the file then it is only 
caught with probability 1/n. This can be addressed, to some extent, by having 
the client request many blocks but then the communication complexity increases. 

Instead, we rely on the homomorphic property of the identification scheme to 
“batch” the authentication of multiple blocks. Specifically, the client will send 
a random integer vector c and the server will respond with w’ := 0, qf; and 
y’ := Combines(c, y); This response can be verified by checking whether 


Vrfy(pk, Combine; (c, a), w, y’) =i, 


(See Figure[]) Although the client-to-server communication is large, the server- 
to-client communication is essentially independent of the file size (cf. footnote[]). 
We reduce the client-to-server communication when we construct a PoS in the 
next section. 


Theorem 1. If X is an unforgeable homomorphic identification protocol, then 
A as in Figure is an unforgeable public-key HLA if H is modeled as a random 
oracle. 


Proof. Correctness is easy to verify, and so we consider security. Let A be a PPT 
adversary attacking A. We construct an adversary A’ attacking X as follows: 


1. A’ is given a public key pk, generates B and p in the obvious way, and runs 
A(pk, p, B). 


2 In some cases linear state may be acceptable, as long as the state is a constant 
fraction shorter than the file itself. When using certain homomorphic identification 
schemes, including the one discussed in Section J] this indeed can be achieved. 
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Let X = (Setup, Comm, Resp, Vrfy) be a homomorphic identification pro- 
tocol and let H be a function. Construct a public-key HLA A = 
(Gen, Tag, Auth, Vrfy) as follows: 


Gen(1"): Compute (pk, sk) — X.Setup(1"). Let B be such that [B] is 
in the challenge space of X, and choose a k-bit prime p. Output the 
public key (pk, p, B) and secret key sk. 
Tag..(f), where f = fil---|fn, and fi € [B] for all i: 
1. Choose st — {0,1}. 
2. For l<i<n: 
a. Set r; := H (st; i) and a; := X.Comm(pk; ri). 
b. Compute yi := X.Resp(pk, sk, ri, fi). 
3. Output t := (1, .- -, Yn) and st. 
Authpų(f,t,c): Compute and output 7 — X.Combines (c, t). 
— Vrfypp(st, 4, €, T): 
1. for 1 < i < n, set r; := H (st; i) and a; := X.Comm(pk; ri). 
2. Output X.Vrfy(pk, Combine: (c, a), p, T). 


Fig. 1. Transforming a homomorphic identification protocol into a HLA 


2. When A requests Tag,,(f) for f = fil---|fn, then (for i = 1 to n) A’ 
queries f; to its own oracle and receives in return (r;, yi). Then A’ chooses 
random st € {0,1}*, sets answers to the random oracle appropriately, and 
gives (J1,---,Yn) and st to A. 

3. Eventually, A outputs a file f. Following this, A’ outputs the vector of n 
challenges f = fi|---|f,, and receives in return (r,). Then A’ chooses 
random st € {0,1}*, set¢] answers to the random oracle appropriately, and 
gives (y, st) to A. 

4. When A finally outputs c, p’, T’, then A’ outputs these same values. 


It is easy to see that A succeeds in attacking A exactly when A’ succeeds in 
attacking X. 


4 From HLAs to Efficient Proofs of Storage 


In this section we show how to use any HLA to construct a PoS having com- 
munication complexity independent of the file size. Our transformation is in the 
standard model. 

It is immediate how an HLA can be used to construct a PoS with communica- 
tion complexity linear in the file size: When storing a file f, the client computes 
tags on all the file blocks and gives to the server the vector of tags t (along with 
f itself). To verify, the client chooses a random c € Zp and sends it to the server; 
the server responds with )7, cif; and Authpk(f,t, c) (which is authenticated by 


3 We assume for simplicity that no st € {0, 1} is chosen twice throughout the exper- 
iment, since this occurs with only negligible probability. 
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the client in the obvious way). If authentication tags output by Auth have length 
O(k), then the server-to-client communication for an n-block file is bounded by 


O(k) + log = asi) < O(k)+logn- p- B= O(k) + logn. 
l 

For typical values of k,n, this means that the server-to-client communication is 

(essentially) independent of the file size. 

To reduce the client-to-server communication, we use a pseudorandom func- 
tion F: the client sends a key K € {0,1}*, and the server then derives the 
challenge vector c by setting ci := Fg (i) for all i. (See Figure) This approach 
is, perhaps, quite “natural” { but it turns out to be highly non-trivial to prove 
that it is sound. (This difficulty was mentioned in [45].) The issue is that since 
the key K is public, we cannot reduce to the security of the pseudorandom 
function in the usual way. Instead we must use a more careful analysis. 


Let A = (Gen, Tag, Auth, Vrfy) be a public-key HLA, and let F be a pseu- 
dorandom function. Construct a publicly-verifiable PoS IT = (Gen, Encode, 
Prove, Vrfy) as follows: 


Gen(1"): Compute and output (pk, sk) — A.Gen(1*). Let p be the prime 
implicit in pk. 

Encodesx(f): Compute (t, st) — A.Tag,,(f), and output f’ = (f,t) 
and st. 

Prove(pk, f’, K), where K € {0,1}*: 


. Parse f’ as (f,t). 
. For 1 <i < n let c := Fg(i), where c; is viewed as an element 
of Zp. 
. Compute T — A.Authpe(f,t,c) and u := So, ci fi. 
4. Output 7 := (1,7). 
Vrfy (pk, st, K,7): 
1. Parse 7 as (1,7). 
2. For 1 <i <n, let c := F(t). 
3. Output b := A.Vrfy pp (St, 4, €, T). 


Fig. 2. Transforming an HLA into a PoS 


Theorem 2. Let A be an unforgeable public-key HLA, and let F be a pseudo- 
random function secure against non-uniform polynomial-time adversaries. Then 
I as in Figure is a secure publicly-verifiable Pos. 


Proof. Correctness of the construction is easily verified, and so we turn to proving 
security. We describe a knowledge extractor K that runs in expected polynomial- 
time and satisfies Definition [6] Recall that K is given pk, st as input and has 


4 A similar approach, based on pseudorandom generators, was proposed in in the 
context of verifiable shuffles. 
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oracle access to A(st4, f’, st,-), which we abbreviate as A(-). Define c(K) = 
(Fx (1),...,K(n)). The high-level structure of K is as follows: 


1. K chooses random K <— {0,1}* and runs A(K) to obtain a proof r. If 
Vrfy(pk, st, K,T) = 0 then K outputs ((K,7), L) and stops. Otherwise, its 
first output will still be (K, 7) but it attempts to recover the original file as 
described next. 

2. K repeatedly rewinds A and sends it different challenges until A responds 
correctly to a total of n challenges K1,..., Kn such that e(K1),...,e(Kn) 
are linearly independent (over Q). Given n successful responses to these n 
challenges, K reconstructs a candidate file f, and outputs it. 


The above neglects some technical details that we now formalize. If A( A’) outputs 
a proof 7 = (4,7) for which Vrfy,;,(st, u, e(K), T) = 1, then we say that K is a 
good challenge. K implements step 2, above, as follows: 


1. Initialize sets Goodx := Good, := Ø. Keep track of the total number of calls 
to A, and halt execution with output fail if 2° calls are made. 

2. Estimate the probability p* with which a random key K is good by running 
A with a random challenge until some fixed polynomial number q = q(k) 
successful verifications occur. By appropriate choice of q, it is possible to 
ensure that the estimate p* is within a factor of 2 of the true probability 
with all but negligible probability 27%". 

3. For j = 1 to n do: 

— Repeatedly sample K; uniformly, querying A on each one, until a good 
K; with e( K4) ¢ span(Good,) is found. If found, then add K; to Goodx 
and add c; = c(K;) to Goode, and go to the next value of j. If no such 
K; is found in at most k?/p* tries, then output fail and halt. 

4. Let Goodx = {K1,..., Kn} and Goode = {e1,...,en}, where cj = c(K;), 
and let 7; = (uj, Tj) be the output of A(K;). Set up the system of linear 
equations {)_; Cji- fi = Lihiejen in the unknowns f = (f1,..-, fn). Solve 
for f (over the integers) and output it. 


We refer to the above as the extraction subroutine. 

To complete the proof, we need to show three things. First, that K runs in 
expected polynomial time for any A. Second, that if A successfully convinces a 
verifier in the PoS protocol with sufficiently high probability, then the extraction 
procedure will successfully complete (specifically, step 3 will be successful) with 
overwhelming probability. Third, that with overwhelming probability the file f 
output by the extraction procedure is indeed equal to the true file f. The first 
and third of these items are essentially standard. The second step would be 
relatively straightforward if the challenge in the PoS protocol were a random 
vector c; what makes it more complicated is that the challenge is a PRF key K 
that is expanded to a vector c = c(K). 
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Fixing sty, f’, and st, we let p* denote the probability that a random chal- 
lenge K is good; i.e., this is the probability with which A(st4, f’, st,-) responds 
correctly to the verifier’s challenge (we assume st. includes A’s coins). 


Claim. K runs in expected polynomial time. 


Proof. If p* = 0 then it is clear that K runs in expected polynomial time. 
So assume p* > 0. We must then analyze the expected running time of the 
extraction procedure, following BIZ. Steps 1 and 4 take strict polynomial time. 
The expected running time of step 2 is exactly (some polynomial times) q(k) /p*. 
As for step 3, there are two cases: If p* < p*/2, then the only thing we can claim 
is that the running time is bounded by (some polynomial times) 2”, due to the 
counter being maintained in step 1. But the probability that p* < p*/2 is at 
most 27%". On the other hand, if p* > p*/2 then the expected running time of 
step 4 is at most (some polynomial times) n- k?/p* < 2nk?/p*. 

K only runs the extraction procedure with probability p*. Thus, the overall 
expected running time of K is upper-bounded by 


p*- (poly() + poly(k) - q(k)/p* + poly(k) - 2* - 27” + poly(k) - 2nk?/p*) ; 
which is polynomial. 


Claim. There exists a negligible function e(-) such that if p* > e(k) then the 
probability (conditioned on the extraction procedure being run) that the extrac- 
tion procedure outputs fail is negligible. 


Observe this implies that 


(pk, sk) — Gen(1*); 
(f, sta) = Afncodess O) (pk); , = ya oe 
Pr [Pa iod. : Vrfy(pk, st,c, T) = 1 N f* = fail 


((c,7), f*) — KAS 8) (pk, st) 
is negligible. 


Proof. We view the cj = e(K;}) as vectors over Zp, and use the fact that integer 
vectors €1,..., Cg, with entries in the range {0,...,p—1}, are linearly dependent 
over Q only if they are linearly dependent over Zp; thus, an upper bound on the 
probability of the latter implies an upper bound on the probability of the former. 
Define 
e (k) = maxz {Pr[K — {0,1} : e(K) € L}}, 


where the maximum is taken over all (n — 1)-dimensional subspaces L C Z%. It 
is not hard to see that if F is a non-uniformly secure PRF then ¢'(k) — 1/p is 
negligible. Since 1/p is negligible, we see that ¢’ is negligible too. Take € = 2e’. 
We show that if p* > e then, conditioned on the extraction procedure being run, 
the probability that it outputs fail is negligible. 
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First, observe that the probability that K times out by virtue of running 
for 2* steps is negligible (this follows from the fact that the expected running 
time of K is polynomial). Next, fix any j and consider step 3. The number of 
challenges that are good is exactly p* - 2*, and the number of challenges K; for 
which e(K;) lies in span(Good.) (which has dimension at most n — 1) is at most 
e'- 2" < p*-2*/2. Thus, the probability that a random K; is both good and does 
not lie in span(Good.) is at least p*/2. If p* is within a factor of 2 of p*, which 
occurs with all but negligible probability, then K finds such a K; within k?/ p“ 
steps with all but negligible probability; a union bound over all values of j € [n] 
then shows that it fails in some iteration with only negligible probability. This 
completes the proof. 


Finally, we show that the probability that the extraction procedure outputs an 
incorrect file is negligible. In conjunction with the previous claims, this completes 
the proof that K satisfies Definition @ 


Claim. For any PPT adversary A, the following is negligible: 


(pk, sk) — Gen(1*); 
(F, sta) — Afreodese() (pk);  Vrfy(pk, st,c, T) = 1 
(f’, st) — Encode,,(f); ` NF € {fail, f} 
(ep) f AF 8) pk ait) 


Proof. The event in question can only occur if, at the end of the extraction 
procedure, there exists c € Goode, with c = c(K), for which A(K) outputs (u, 7) 
such that Vrfy(pk, st, K,(u,7)) =1 yet u A J ;cifi But this exactly means 
that A has violated the assumed unforgeability of A. Since K runs in expected 
polynomial-time, it follows by a standard argument that this occurs with only 
negligible probability. 


This concludes the proof of Theorem J] 


5 A Concrete Instantiation Based on Factoring 


In this section we describe a homomorphic variant of the identification protocol 
of Shoup [I5], whose security is based on the hardness of factoring. Together with 
the transformations described in the previous sections, this yields a factoring- 
based PoS in the random oracle model. 

Protocol XShoup, described in Figure B] relies on a Blum modulus generator 
Gengium that takes as input a security parameter 1% and outputs a tuple (N, p, q) 
such that N = p -q where p and q are k-bit primes with p = q = 3 mod 4. We 
denote by OR y the set of quadratic residues modulo N, and by J, fa 1 the elements 
of Zy with Jacobi symbol +1. We use the following standard facts regarding 
Blum integers: (1) given x € Z% it can be efficiently decided whether x € JR’; 
(2) if x € J+, then exactly one of x or —z is in QRw; (3) every x € QRy has 
four square roots, exactly one of which is itself in ORy. 
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Define homomorphic identification protocol S’shoup as follows: 


Setup(1"): Generate (N,p,q) — Gensium(1"). Choose y — ORn, and 
output pk := (N,y) and sk := (p,q). 

Comm(pk;r): View r as an element of 7! and output a := r. 
Resp(pk, sk, r, 3): Let 8 € Zax (which defines the challenge space). Out- 
put y, a random 2?*th root of tr-y® mod N (where the sign is chosen 
to ensure that a square root exists). 


j 23k ? 
Vrfy(pk,a, B, y): Output 1 iffy" = 


+a- y? mod N and 8 < 2°. 


Combine; and Combines are defined as follows: 


— Let c € Z}, and a € Zy. Then Combine: (c, œ) ‘1 a; mod N. 


— Let c € Zik and y € ZN. Then Combines (c, y) = J [;—; y mod N. 


Fig. 3. A homomorphic identification protocol based on factoring 


Correctness of Xshoup as a Stand-alone identification protocol is immediate. 
Let us verify that it is homomorphic. Fix public key (N,y), challenge vector 


? 


c € Z5,, and {(@i, Bi, Yi) }i<i<n such that q2” = +a;-y®' mod N for all i. Then 


93k 


Combines (c, y)" = (Ù x) mod N 


= +Combine; (c, a) - y+: mod N, 
and furthermore J`; cibi < n- 25. 2} < 23%. 


Theorem 3. X'Shoup is an unforgeable homomorphic identification protocol if the 
factoring assumption holds with respect to Gengum. 


Proof. The high-level ideas are similar to those in [I5], though the proof here 
is a bit simpler. Given a PPT adversary A attacking Yshoup, we construct a PPT 
algorithm 6 computing square roots modulo N output by Gengium. This implies 
factorization of N in the standard way. Algorithm B works as follows: 


— B is given a Blum modulus N and a random y € ORw. It runs A on the 
public key pk = (N, y). 

— When A outputs 6’ € Zəx, then B chooses random y € Zy and b € {0,1}, 
and sets r := a := (—1)'-72"" /y® mod N. It then gives (r, y) to A. 
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— When A outputs an n-vector of challenges Ø, then for each i algorithm 6 
computes (ri, yi) as in the previous step. It gives (r,-y) to A. 

— If A outputs (c, py’, y’) with Vrfy(pk, Combine; (c, œ), w, y) = 1 but w Æ 
X; cibi, then B computes a square root of y as described below. 


Note that the simulation provided for A by B is perfect, and so A succeeds in 
the above with the same probability with which it succeeds in attacking the 
real-world protocol S’shoup- 

To complete the proof, we describe the final step in more detail. Define 


a* = Combine;(c,a@), y* = Combines(c,y), w= bee 


If Vrfy(pk, o*, w, y) = 1 but p’ A p, then (Vy? = +a". yt mod N; further- 
k 

more, B also knows that (a = +a*-y" mod N. Assume without loss of 

generality that u > p’. Since y E QR y this implies 


(J7) = yt mod N (1) 


with u, u’ < 2°* (and so p— p < 23*). Write u — u’ = f - 2° for t < 3k and f 
odd. Since squaring is a permutation of OR», Equation (I) implies 


, % 23k-t f 
iy) =y" mod N. 

Using the extended Euclidean algorithm, 6 computes integers A, B such that 
Af + B23*—t = 1. Then 


3k—t 


(wiry “ery = (iy P) = y^fyB y, 


and so 6 can compute a square root of y. Since B computes a square root 
whenever A succeeds, the success probability of A must be negligible. 


Acknowledgments. We are grateful to Gene Tsudik for his insightful com- 
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without Random Oracle 
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Abstract. Adaptive oblivious transfer (OT) is a two-party protocol 
which simulates an ideal world such that the sender sends Mj,:-- , Mn 
to the trusted third party (TTP), and the receiver receives Mo, from 
TTP adaptively for i = 1,2,---k. This paper shows the first pairing-free 
fully simulatable adaptive OT. It is also the first fully simulatable scheme 
which does not rely on dynamic assumptions. Indeed our scheme holds 
under the DDH assumption. 


Keywords: Adaptive OT, Fully Simulatable, DDH, Standard Model. 


1 Introduction 


In a non-adaptive (k,n) oblivious transfer (OT) scheme which is denoted by 
OT, BOIA, a sender has n secret strings Mj,--- , Mn, and a receiver has k 
secret choice indices o1,--- ,o% E {1,--- ,n}. At the end of the protocol, the re- 
ceiver learns Moi, ++ , Mo, (only), and the sender learns nothing on o1,-++ , ox. 
Efficient OT schemes are important because OT; is a key building block for 
secure multi-party computation [ZOM]. 

In an adaptive (k,n) oblivious transfer protocol which is denoted by OT$}, 
the receiver chooses g; adaptively depending on M,,,--- , Moi [Ld]. In other 
words, OT}; is a two-party protocol (S, R) which simulates an ideal world 
protocol (S’, R’) such that 


1. the sender S” sends Mj,--- , Mn to the trusted third party (TTP), and 
2. the receiver R’ receives Mo, from TTP adaptively for i = 1,2,---k, where 
the receiver chooses o; based on Moi, , Moia: 


Adaptive OT has wide applications such as oblivious database searches, secure 
multiparty computation and etc, too. 

As a security notion of OT (for both non- ae and adaptive), half simu- 
latability was considered until recently . This definition requires 


— (Sender’s privacy.) For any receiver R in the real world, there exists a receiver 
R in the ideal world such that the outputs of R and R are indistinguishable. 
— (Receiver’s privacy.) For any input to the receiver, the view of the sender 
must be indistinguishable. (Note that the honest sender outputs nothing.) 


M. Matsui (Ed.): ASIACRYPT 2009, LNCS 5912, pp. 3341346] 2009. 
© International Association for Cryptologic Research 2009 


Simple Adaptive Oblivious Transfer without Random Oracle 335 


However, Naor and Pinkas noticed that there can be a practical attack on a half 
simulatable adaptive OT [I5]. 

To solve this problem, Camenisch, Neven and shelat formalized a notion of 
full simulatability BP]. In this definition, we consider a pair of outputs of the 
sender and the receiver. Although the honest sender outputs nothing, a malicious 
sender may output its view in the execution of the protocol. Full simulatability 
now requires that 


— (Sender’s privacy) For any receiver R in the real world, there exists a re- 
ceiver Ñ in the ideal world such that (9%,,,,, Rhu) is indistinguishable from 
(Sout; Roi): where Aout denotes the output of A. 

— (Receiver’s privacy) For any sender S in the real world, there exists a sender 
Ô’ in the ideal world such that (S”,,,, R’,,;) is indistinguishable from (9 


out? 
Rout) : 


They then showed a fully simulatable adaptive OT in the random oracle model, 
and one in the standard model, respectively B]. 

We focus on the standard model in this paper[] Then all fully simulatable 
adaptive OT known so far have been constructed based on pairing, and they 
rely on dynamic assumptions such as q-strong DH assumption. For example, 
Camenisch et al.’s OT7",, relies on q-strong DH assumption and q-PDDH as- 
sumption. Green and Hohenberger’s OT/',, relies on q-hidden LRSW assump- 
tion P]. (This scheme achieves UC security.) Jarecki and Liu’s OT}; relies on 
the decisional g-DHI assumption [LQ]. 

This paper shows the first pairing-free fully simulatable adaptive OT. It is 
also the first fully stmulatable scheme which does not rely on dynamic assump- 
tions. Indeed our scheme holds under the DDH assumption. While the previous 
schemes use a signature scheme as a building block our scheme utilizes ElGamal 
encryption scheme. (Hence we do not need a pairing.) 

Our scheme is conceptually very simple and efficient. The initialization phase 
and each transfer phase are constant round protocols. Thus the total round 
complexity is proportional to k. 

Finally we extend our scheme to a fully simulatable non-adaptive OT which 
requires constant rounds. Green and Hohenberger showed a fully simulatable 
non-adaptive OT? based on pairing under the decisional BDH assumption [8]. 
On the other hand, our OTY is pairing-free and relies on the DDH assumption. 

Lindell showed a fully simulatable OT? under DDH, Paillier’s decisional Nth 
residuosity, and quadratic residuosity assumptions as well as under the assump- 
tion that homomorphic encryption exists [I3]. (He claimed that they can be 
extended to OT;".) Under the DDH assumption, our OT? is more efficient than 
the Lindell’s scheme [I3]. 


1 In the random oracle model, Ogata and Kurosawa showed an adaptive OT based on 
Chaum’s blind signature scheme [I8]. Camenisch, Neven and shelat [2] proved that 
it is fully simulatable as well as they corrected a flaw of [I8]. Green and Hohenberger 
showed a scheme under the decisional BDH assumption [§]. 

? Maybe because an adaptive OT shown by Ogata and Kurosawa [[§] utilizes Chaum’s 
blind signature scheme. 
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Table 1. Fully simulatable Adaptive OT without RO 


dynamic assumption 
Camenisch et al. | yes | yes q-strong DH and qg-PDDH 


yes - 
Green and Hohenberger yes yes g-hidden LRSW (UC secure) 
Jarecki and Liu [0] yes yes q-DHI 
no 


Proposed____| no E S 


2 Preliminaries 


2.1 Notations 


In this paper, we denote a security parameter by r € N. All the algorithms take 
T as the first input and run in (expected) polynomial-time in 7. We denote prob- 
abilistic polynomial-time by PPT for short. We often do not write the security 
parameter explicitly. 


2.2 Proof Systems 


To design our scheme, we use several proof systems. We follow the definitions 
described in HSB]. 

Let R = {(a, 8)} C {0,1}* x {0,1}* be a binary relation R such that |8| < 
poly(q) for all (a, 8) € R, where poly is some polynomial. We only consider the 
relation R such that (a, 3) € R can be decided in polynomial in |a| for all (a, 8). 
We define Lr = {a | 38 such that (a, 8) € R}. 


Proof of Membership (PoM): A pair of interacting algorithms (P, V), called 
a prover and a verifier, is a proof of membership (PoM) for a relation R if the 
completeness and soundness are satisfied. Here, we say that (P,V) satisfies the 
completeness if for all (a, 3) € R, the probability of V(@) accepting a conversa- 
tion with P(a, 3) is 1. Also we say that (P,V) satisfies the soundness if for all 
a ¢ Lpr and all P*(qa) (including cheating provers), the probability of V(a) ac- 
cepting the conversation with P* is negligible in |a|. We say that this probability 
as soundness error of the proof system. 


Proof of Knowledge (PoK): We say a pair of interacting algorithms (P, V) is 
PoK for a relation R with knowledge error « € [0,1] if it satisfies completeness 
described above and has an expected polynomial-time algorithm, called knowledge 
extractor, E. Here, the algorithm E is a knowledge extractor for a relation R if 
possibly cheating P has probability € of convincing V to accept a, then E, when 
given black-box access to P, outputs a witness 8 for œ with probability € — k. 


Witness Indistinguishability (WI): A proof system (P,V) is perfect WI if 
for every (a, 31), (a, 82) € R, and any PPT cheating verifier, the output of V(a) 
(including cheating verifier) after interacting with P(3,) and that of V(a) after 
interacting with P(62) are identically distributed. 
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Zero Knowledge (ZK): We say that a proof system (P, V) is perfect ZK if there 
exists an expected polynomial-time algorithm Sim, called a simulator, such that 
for any PPT cheating verifier V and any (a, 8) € R, the outputs of V(a) after 


interacting with P(@) and that of Sim”) (a) are identically distributed. 


3 k-Out-of-n Oblivious Transfer 


In this section, we present a UC-like definition of fully simulatable non-adaptive 
OT. Similarly, we present a UC-like definition of fully simulatable adaptive OT. 
We consider a weak model of UC framework as follows. 


— At the beginning of the game, an adversary A can corrupt either a sender S 
or a receiver R, but not both. 

— A can send a message (which will be denoted by Aout) to an environment 
Z after the end of the protocol. (A cannot communicate with Z during the 
protocol execution.) 


The ideal functionalities of OT? and OT?,, will be shown below. For a protcol 
m = (S,R), define Adv(Z) as 


Adv(Z) = | Pr(Z = 1 in the real world) — Pr(Z = 1 in the ideal world)| 


3.1 Non-adaptive k-Out-of-n Oblivious Transfer 


In the ideal world of OT?, the ideal functionality Fnon, an ideal world adversary 
A’ and an environment Z behave as follows. 


(Initialization phase:) 

1. An environment Z sends (Mj,--- , Mn) to the dummy sender S’. 

2. S' sends (M*,--- , MŽ) to Fnon, where (M*,--- , M*) = (Mi, , Mn) if S' 
is not corrupted. 


(Transfer phase:) 


1. Z sends (o4,--- ,o%) to the dummy receiver R’, where 1 < o; < n. 
2. R’ sends (ož, ,a%) to Fnon, where (ož, ,o%) = (01, +> ,o%) if R’ is not 
corrupted. 


3. Fnon sends received to an ideal process adversary A’. 
4. A’ sends b= 1 or 0 to Fron, where b = 1 if S’ is not corrupted. 
5. Fnon sends Y to R’, where 


ya {Mb MG) if b= 1 
í ifb=0 


6. R’ sends Y to Z. 


After the end of the protocol, A’ sends a message A’,,,, to Z. Finally Z outputs 
1 or 0. 
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In the real world, a protocol (S,R) is executed without Fron, where the 
environment Z and a real world adversary A behave in the same way as above. 


Definition 1. We say that (S,R) is secure against the sender (receiver) corrup- 
tion if for any real world adversary A who corrupts the sender S (the receiver 
R), there exists an ideal world adversary A’ who corrupts the dummy sender S' 
(the dummy receiver R') such that for any environment Z, Adv(Z) is negligible. 


Definition 2. We say that (S,R) is a fully simulatable OT? if it is secure 
against the sender corruption and the receiver corruption. 


3.2 Adaptive k-Out-of-n Oblivious Transfer 


In the ideal world of OT;,,, the ideal functionality Fadapt, an ideal world ad- 
versary A’ and an environment Z behave as follows. 


(Initialization phase:) 


1. An environment Z sends (Mj,--- , Mn) to the dummy sender S’. 
2. S! sends (M¥,--- , MŽ) to Faaapt, where (Mx,--- ,M*) = (Mi, +- , Mn) if 
S’ is not corrupted. 


(Transfer phase:) For i = 1,--- ,k, 


Z sends g; to the dummy receiver R’, where 1 < oj < n. 

R’ sends ož to Fadapt, Where of = 0; if R’ is not corrupted. 
Fadapt Sends received to an ideal process adversary A’. 

A’ sends b = 1 or 0 to Fadapt, Where b = 1 if S’ is not corrupted. 
Fadapt sends Y; to R’, where 


Clie go 


_ [ M* ifb=1 
n= { Lif b=0 


6. R’ sends Y; to Z. 


After the end of the protocol, A’ sends a message A’,,,, to Z. Finally Z outputs 
1 or 0. 

In the real world, a protocol (S,R) is executed without Fadapt, where the 
environment Z and a real world adversary A behave in the same way as above. 


Definition 3. We say that (S,R) is secure against the sender (receiver) corrup- 
tion if for any real world adversary A who corrupts the sender S (the receiver 
R), there exists an ideal world adversary A’ who corrupts the dummy sender S' 
(the dummy receiver R') such that for any environment Z, Adv(Z) is negligible. 


Definition 4. We say that (S,R) is a fully simulatable OT}, if it is secure 
against the sender corruption and the receiver corruption. 
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3.3 Remarks 


Our definition of fully simulatable adaptive OT is weaker than the UC security 
because our adversaries A cannot communicate with Z during the protocol exe- 
cution. On the other hand, it is stronger than that of |2) which is not UC-like. In 


our definition, Z chooses o;. Hence a; can depend on all of (M1, +-+ , Mn). In the 
definition of [2], receiver chooses o;. Hence o; can depend on (Moi, +- ,Mo,_,) 
only. 


4 Our Fully Simulatable Adaptive OT 


In this section, we show an adaptive OT}, based on ElGamal encryption 
scheme, and prove its full simulatability under the DDH assumption. 

Let G be a multiplicative group of prime order g. Then the DDH assumption 
states that, for every PPT distinguisher D, 


eppx(D) = | Pr(D(g, 9%, gf, 9%") = 1) — Pr(D(g, 9, 9°, 97) = 1)| 
is negligible, where the probability is taken over the random bits of D, the random 
choice of the generator g, and the random choice of a, 3, y € Za. We denote 
€ppH = max{eppx(D)}, 


where the maximum is taken over all PPT distinguishers D. 
The initialization phase and each transfer phase are constant round protocols. 
Hence the total round complexity is proportional to k. 


Initialization Phase 


1. The sender chooses G, g and («1,--- ,2n,7) € (Zq)"** randomly, and com- 
putes h = g”. 
2. For i = 1,--- n, the sender computes 


C; = (Ai, Bi) = (g™, Mi - h°), 


where Mi,- , Mn EG. 

3. The sender sends (G, h, C1,- , Ch). 

4. The sender proves by ZK-PoK that he knows r. 
The protocol stops if the receiver rejects. 


The jth Transfer Phase 


1. The receiver chooses a choice index 1 < oj < n based on Mg,,--+ ,Mo,_,- 
2. The receiver chooses u € Zg randomly and computes U = (As, )". 
He then sends U. 


3. The receiver proves in WI-PoK that he knows u such that 
U= A V- eN U sA. 


The protocol stops if the sender rejects. 
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4. The sender computes V = U” and sends V. 
5. The sender proves that (g,h,U,V) in ZK-PoM that it is a DDH-tuple. 
The protocol stops if the receiver rejects. 
6. The receiver obtains M,, by computing B,,/V1/". 
Three ZK or WI proof systems in the scheme are constructed efficiently as 
follows. 


— An efficient 4-round ZK-Pok exists which can be used in the initialization 
phase. It is obtained by applying the technique of [A] to Schnorr’s identifica- 
tion scheme [9]. 

— An efficient 3-round WI-PoK exists which can be used in the transfer phase. 
It is implemented by applying the or-composition technique [5] to [9]. 

— An efficient 4-round ZK-PoM exists which can be used in the transfer phase. 
It comes from the confirmation protocol of Chaum’s undeniable signature 
scheme (which is a ZK-PoM for the DDH-tuple [3]). 


Theorem 1. The above protocol is a fully-simulatable adaptive OT; under 
the DDH assumption. 


The proof is given in Section 


5 Extension to Fully Simulatable Non-adaptive OT 


In this section, we extend our adaptive OT to a fully simulatable non-adaptive 
OT which requires constant rounds. 


5.1 How to Prove Many DDH-Tuples 


We show a 4-round ZK-PoM which proves that (g,h,Ui,Vi),--- ,(g,h, Uk, Ve) 
are all DDH-tuples. 


1. The receiver sends random (ay,--- , ax). 
2. The sender proves that (g, h, Ma va V“) is a DDH-tuple by using 
the confirmation protocol of [B]. 


The confirmation protocol of B| is a 4-round ZK-PoM on a DDH-tuple. Hence 
the above protocol runs in 4-round. (Step 1 and the 1st round of the confirmation 
protocol are merged.) 


Lemma 1. Suppose that some (g, h, Ui, Vi) is not a DDH-tuples. Then 
(g, AIA EATEN V,") is a DDH-tuples with negligible probability. 
Proof. Assume that U; = g” and V; = h” for i = 1,--- ,k. Then 


k 
[Lue =92 = 


i=l 


k 
[LV =22 0 
w=1 


Simple Adaptive Oblivious Transfer without Random Oracle 341 


Suppose that (g, h, U1, V1) is not a DDH-tuples. That is, xı 4 yı. Then for any 


values of a2,--- , az, there exists a unique a; such that 
k 
XC a(z — yi) = 0 mod q. (1) 
i=1 


Hence the numbers of (a,--: ,ax) which satisfies eq.( is equal to q%7t. 
Therefore 
Pr(eq.() holds) = g*~*/q* = 1/4. 


This means that (g,h, i- iU, a V“) is a DDH-tuples with negligible 
probability. 


Theorem 2. The above protocol is a ZK-PoM on many DDH-tuples. 


Proof. The completeness is clear. The zero-knowledgeness follows from that of 
the confirmation protocol of B]. The soundness follows from Lemma [Mand that 
of the confirmation protocol of [B]. 


5.2 Constant Round OT; 
In this section, we modify our OT}, to obtain a constant round OT% as follows. 


— At step 4 of the initialization phase, the sender sends (G, h, Ai,--- , An). 

— At the end of the transfer phase, the sender sends (By,--- , Bn). 

— In the transfer phase, run step 3 in parallel (still it is a WI protocol). 
At step 5, the sender proves that (g,h,U1,Vi),--- , (g9, h, Uk, Ve) are all DDH- 
tuples by using the ZK-PoM of Sec ETI 


Theorem 3. The proposed OT; is a constant round fully-simulatable OT% un- 
der the DDH assumption. 


The proof is similar to that of Theorem [I 


6 Proof of Theorem [J 


We first prove that the proposed scheme is secure against sender corruption. We 
next prove that it is secure against receiver corruption. 


6.1 Security against Sender Corruption 
Lemma 2. The proposed scheme is secure against sender corruption. 


Proof. For every real-world adversary A who corrupts the sender, we construct 
an ideal-world adversary A’ such that Adv(Z) is negligible. 

We will consider a sequence of games Gamep, Game, ---, Gamey, where Gameg is 
the real world experiment of SecB] and and Game; is the ideal world experiment, 
respectively. Let 

Pr(GAME;) = Pr(Z = 1 in Game;). 
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Gameg: This is the real world experiment such that the sender is controlled by 
an adversary A. Hence 


Pr(GAMEo) = Pr(Z = 1 in the real world). 


Game,: This is the same as the previous game except for the following. In the 
initialization phase, if the receiver accepts the ZK-Pok, then he extracts r from 
A by running the knowledge extractor E which is allowed to rewind A. This 
game outputs L if the extractor E; fails in extracting r. Unless this happens, 
these two games are identical. Therefore, 


|Pr(GAMEo) — Pr(GAME})| < «1, 


where «xı be the knowledge error of the extractor. 


Gamez: This is the same as the previous game except for the following. In each 
transfer phase, if the receiver accepts the ZK-PoM which proves that (g, h, U, V) 
is a DDH-tuple, then he obtains Mo, by computing B,,/A),. These two games 
are identical unless the above Mo, is different from Bz, / V'/“. This happens if 
the receiver accepts the ZK-PoM even though (g,h,U,V) is not a DDH-tuple. 
Hence 


|Pr(GAME,) — Pr(GAMEg2)| < kks, 
where «3 is the soundness error probability of ZK-PoM. 
Game3: This is the same as the previous game except for the following. In each 


transfer phase, the receiver computes U as U = Ai. (The receiver can still obtain 
Mo, as can be seen from Game.) Since our WI-PoK is perfect, 


Pr(GAMEz2) = Pr(GAMEs3). 


Gamey: This game is the ideal world experiment in which an ideal-world adversary 
A’ plays the role of the receiver of Game3 and uses A as a blackbox. A’ can do 
this because the receiver does not use 01,--: ,o,% in Game3. 

Finally A’ outputs what A outputs. It is easy to see that Game3 and Game, are 
identical from a view point of Z. Hence 


Pr(GAME3) = Pr(GAMEa). 


Further 
Pr(GAMEq) = Pr(Z = 1 in the ideal world). 


Now, we can summarize this lemma as follows: 
Adv(Z) = |Pr(GAME,) — Pr(GAMEo)| 
3 
< X. |Pr(GAME;41) — Pr(GAMg;)| 


i=0 
< Kı + kk. 
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6.2 Security against Receiver Corruption 


Lemma 3. The proposed scheme is secure against receiver corruption under the 
DDH assumption. 


Proof. For every real-world adversary A who corrupts the receiver, we construct 
an ideal-world adversary A’ such that Adv(Z) is negligible. 

We will consider a sequence of games Gamep, Game;,---, Games, where Gameg 
is the real world experiment of Sec] and Games is the ideal world experiment. 


Game: This is the real world experiment such that the receiver is controlled by 
an adversary A. Hence 


Pr(GAMEo) = Pr(Z = 1 in the real world). 


Game,: This is the same as the previous game except for the following. In each 
transfer phase, instead of running the ZK-PoM which proves that (g,h,U,V) 
is a DDH-tuple, the sender runs the zero-knowledge simulator of the ZK-PoM 
which is allowed to rewind A. Since the ZK-PoM is perfect ZK, we have 


Pr(GAME,) = Pr(GAMEo). 


Gamez: This is the same as the previous game except for the following. In each 
transfer phase, if the sender accepts the WI-Pok, then she extracts u from A 
by running the knowledge extractor Ez which is allowed to rewind A. This game 
outputs L if the extractor Eə fails in extracting u. Unless this happens, these 
two games are identical. Therefore, 


|Pr(GAME2) — Pr(GAME})| < kro, 
where «ə is the knowledge error of the extractor. 


Game3: This is the same as the previous game except for that the sender computes 
V as V = (B,/M,)" instead of V = U”. It is clear that there is no essential 
difference between two games. Therefore, 


Pr(GAME3) = Pr(GAMEz). 


Game,: This is the same as the previous game except for that the sender uses 
a random M; to compute each C; in the initialization phase. The difference 
| Pr(GAME,) — Pr(GAME3)| is still negligible by the semantic security of the 
ElGamal cryptosystem which is implied by the DDH assumption. 


Claim. If the DDH problem is hard then |Pr(GAME4) — Pr(GAMEs)| is negligi- 
ble. More concretely, 


|Pr(GAME4) = Pr(GAMEs)| < €DDH.- (2) 


The proof of this claim is given later. 


344 K. Kurosawa and R. Nojima 


Games: This game is the ideal world experiment in which an ideal-world adversary 
A’ plays the role of the sender of Game4, and uses A as a blackbox. A’ can do this 
because the sender does not use M1,- , Mn in Gamez. 

Finally A’ outputs what A outputs. It is easy to see that Game, and Games are 
identical from a view point of Z. Hence 


Pr(GAMEq4) = Pr(GAMEs). 


Further 
Pr(GAMEs5) = Pr(Z = 1 in the ideal world). 


Now, we can summarize this lemma as follows: 


Adv(Z) = |Pr(GAMEs) — Pr(GAMEo)| 


4 
< XL |Pr(GAME;41) — Pr(GAME;)| 
i=0 


< kk2 + €ppu. 


To complete the proof, we must provide the proof of the claim. To do so, we 
need the following lemma] which can be thought of as an “extended” version of 
the DDH assumption. 


Lemma 4 (Lemma 4.2 in [I7]). If there exists a probabilistic algorithm D 
with running time t such that 


Pr (D(g, g", 97", o e e a) = 1) 
= PrDigg 67%" Cee 1g") = 1) >e 


where the probability is taken over the random bits of D, the random choice of 
the generator g in G, and the random choice of %1,°+* ,En,T,Z1,*** ,Zn € Zq, 
then there exists a probabilistic algorithm with running time n- poly (T) + t that 
breaks the DDH assumption with probability > € with some polynomial poly. 


We now show a proof of the claim. 


Proof (of the claim). Let Game, (Game’,) be the same as Game; (Game4) except 
for the following. In the initialization phase, instead of running the ZK-PoK in 
which the sender proves that he knows r, the sender runs the zero-knowledge 
simulator of the ZK-PoK which is allowed to rewind A. Since the ZK-PoK is 
perfect ZK, it holds that 


Pr(GAME3) = Pr(GAMEs), 
Pr(GAME,) = Pr(GAMEa). 


3 Naor and Reingold proved it by using the random reducibility of the DDH-tuple. 


Simple Adaptive Oblivious Transfer without Random Oracle 345 


We now construct a DDH distinguisher D in the sense of Lemma] The input to 
D is (g, h, g7*, -+> , 9°", Y1; ttt > Yn), Where yi = g’”' or g”, Our D simulates Z, 
A and the sender of Game’, or Game’, faithfully except for that in the initialization 
phase, D simulates the sender by using (g,h,g*!,--: , g7”), and hi = yi for each 
i. Finally D outputs 1 iff Z outputs 1. 

It is easy to see that D simulates Game’ if y; = g"™ for each i, and Game), 
otherwise. Therefore 


|Pr(GAME,) — Pr(GAME3)| < eppu. (3) 


Hence eq.(@ holds. 


7 Fully Simulatable OT? 


We have constructed a fully-simulatable adaptive OT under the DDH assump- 
tion in the standard model. It is clear that we can obtain a fully-simulatable 
(1,2)-OT (OT?) as a special case. 

On the other hand, Lindell showed a fully simulatable OT? under DDH, Pail- 
lier’s decisional Nth residuosity, and quadratic residuosity assumptions as well 
as under the assumption that homomorphic encryption exists in the standard 
model [I3]. 

Let’s compare our scheme with Lindell’s OT? which is based on the DDH 
assumption. His scheme builds on the OT? of and uses a cut-and-choose 
technique. The computational cost and the communication cost are O(¢) times 
larger than those of our first scheme to achieve 


Adv(Z) < 27%, 


Hence our scheme is more efficient. 
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Abstract. An r-collision for a function is a set of r distinct inputs with 
identical outputs. Actually finding r-collisions for a random map over a 
finite set of cardinality N requires at least about N (r—1)/ units of time 
on a sequential machine. For r=2, memoryless and well-parallelizable 
algorithms are known. The current paper describes memory-efficient and 
parallelizable algorithms for r > 3. The main results are: (1) A sequential 
algorithm for 3-collisions, roughly using memory N° and time N’~° 
for a < 1/3. In particular, given N 1/3 units of storage, one can find 
3-collisions in time N?/°. (2) A parallelization of this algorithm using 
N*/3 processors running in time N 1/3 where each single processor only 
needs a constant amount of memory. (3) A generalisation of this second 
approach to r-collisions for r > 3: given N° parallel processors, with 
s < (r—2)/r, one can generate r-collisions roughly in time N&°~D/")-s, 
using memory N“("~?)/")-s on every processor. 


Keywords: multicollision, random map, memory-efficient, parallel im- 
plementation, cryptanalysis. 


1 Introduction 


The problem of finding collisions and multicollisions in random mappings is 
of significant interest for cryptography, and mainly for cryptanalysis. It is well 
known that finding an r-collision for a random map over a finite set of cardinality 
N required] more than N(°-)/" map evaluations. 


Multicollisions for hash functions. If the map under consideration is a hash func- 
tion, or has been derived from a hash function, many researchers consider faster 
multicollisions as a certificational hash function weakness. Accordingly, it was 
worrying for the research community to learn that multicollisions could be found 
much faster for a widely used class of hash functions: iterated hash functions [9]. 
For n-bit hash functions from this class, one can generate 2*-collisions in time 


1 An r-collision is a set of r different inputs 11,...,2, which all generate the same 
output map(z1) =--- = map(a,). For an r-collision, one needs to evaluate the map 
(r!)l/". N@-D/" times [3]. For small r, we can approximate this by O(N°~)/"). 
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k - 2”/2, rather than the expected time 2”2*-)/2". The basic observation is 
straightforward: given a sequence of k consecutive 2-collisions, it is possible with 
iterated hash functions to consider the 2" different messages obtained by taking 
all possible choices of message block for each collision and obtain 2" times the 
same output. These iterated multicollisions have been generalized later, to more 
complex types of iterated hash functions, for example, see BH3IA. It was also re- 
marked that these iterated multicollisions were a rediscovery and generalization 
of an older attack of Coppersmith B]. 

In particular, this type of multicollisions allowed a surprising attack on hash 
cascades, i.e., hash functions H, which are the concatenation of two hash func- 
tions G, and Go, i.e., H(X) := (Gi(X),Go(X)). If, say, Gi is an iterated 
hash function and vulnerable to the multicollision attack, and G2 is any n-bit 
hash function, the adversary just needs to generate a 2”/?-multicollision for G4. 
Thanks to the birthday paradox, among the 2”/? messages colliding for G1, one 
expects to find a pair of messages colliding for G2 with constant probability. As 
a consequence, a collision for the 2n-bit hash function H can be obtained with 
much less than 2” hash evaluations. 


Multicollisions for random maps. In contrast to [9], we consider generic at- 
tacks, and, accordingly, we model our functions as random maps. In that case, 
the number of N‘"-/" is a lower bound on the sequential time required for 
finding a r-collision, and time-optimal algorithms are well-known. Furthermore, 
it is well-known how to find ordinary collisions (aka 2-collisions) with negli- 
gible memory (using Floyd, Brent or Nivasch cycle finding algorithms), 
and also how to parallelize these algorithms using distinguished point meth- 
ods [BIIP]. 

In general, the issue of memory-efficient and parallelizable r-collision algo- 
rithms appears to be an unsolved question. Authors usually assume N("—)/r 
units of memory (i.e., the maximum any algorithm can use in the given amount 
of time) and neglect parallelization entirely. For recent examples of the applica- 
tion of multicollisions to cryptography, see, e.g., the cryptanalysis of the SHA-3 
candidates Aurora-512 and JH-512 [29]. We stress that em- 
ploy generic multicollisions as a part of their attacks, always assuming maximum 
memory and ignoring the issue of parallel implementations. 

So the question is, do authors need to be so pessimistic, or are there memory- 
efficient and parallelizable algorithms for r-collisions? For small r, and mainly 
for r = 3, the current paper provides a clearly positive answer. As an applica- 
tion of our results, we will observe attacks on the SHA-3 candidate hash function 
Aurora-512. These attacks make heavy use of multicollisions on internal struc- 
tures. Some attacks on other SHA-3 candidates don’t benefit from our algorithms 
for different reasons. See section BJ] of the appendix. 


Notation. To avoid writing cumbersome logarithmic factors, we often express 


running times using the soft-Oh-notation. Namely, O(g(n)) is used as a short- 
hand for O(g(n) - log(g(n))*) for some fixed k. 
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While the number of values that needs to be computed before a 3-collision can 
be formed is often considered and analyzed, e.g. in [I7] Appendix B] or [22], the 
known algorithmic method to find such a 3-collision is rarely considered in detail 
and is mostly folklore. In order to compare the new algorithms which we describe 
in sections BJ to Øl] with existing algorithms, we thus give a precise description of 
the folklore algorithm, together with a larger variety of time/memory tradeoffs. 
Throughout this section, we fix two parameters a and ( and consider 3-collisions 
for a function F defined on a set of cardinality N. The parameter a controls the 
amount of memory, limiting it to O(N°). Similarly, 8 controls the running time, 
at O(N®). Of course, these parameters need to satisfy the relation a < £. 

We consider Algorithm J] This algorithm is straightforward. First, it com- 
putes, stores and sorts N° images of random points under F. For bookkeeping 
purposes, it also keeps track of the corresponding preimages. Second, it computes 
N® additional images of random points and seek each in the precomputed table. 
Whenever a hit occurs, it is stored together with the initial preimage in the 
sorted table. The algorithm succeeds if one of the N® original images is found 
twice more during the second phase and if the three corresponding preimages 
are distinct. In the formal description given as Algorithm [ we added an op- 
tional step which packs colliding values generated during the first step into the 
same array element. If this optional step is omitted, then the early collisions are 
implicitly discarded. Indeed, in the second phase, we make sure that the search 
algorithm always returns the first position where a given value occurs among 
the known images F(x). During the complexity analysis, we ignore the optional 
packing step since it runs in time N® and can only improve the overall running 
time by making the algorithm stop earlier. 

We now perform a rough heuristic analysis of Algorithm [I] where constants 
and logarithmic factors are ignored. On average, among the NÊ images of the 
second phase, we expect that N°+°~! values hit the sorted table of N® elements. 
Due to the birthday paradox, after N°/? hits, we expect a double hit to occur. At 
that point, the algorithm succeeds if the three known preimages corresponding 
to the double hit are distinct, which occurs with constant probability. For the 
algorithm to succeed, we need: 


a+B-1>a/2, 
as a consequence, to minimize the running time, we enforce the condition: 
a+ 26 =2. (1) 


For a = p, we find a = 8 = 2/3 and obtain the classical folklore result with time 
and memory O(N2/3). Other tradeoffs are also possible. With constant memory, 
ie. a = 0, we find a running time O(N). Another tradeoff with a = 1/2 and 
B = 3/4 will be used as a point of comparison in section B] 
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Algorithm 1. Folklore 3-collision finding algorithm 
Require: Oracle access to F operating on [0, N — 1] 
Require: Parameters: a < @ satisfying condition JJ 

Let Na — [N° | 

Let Ng — [N°] 

Create arrays Img, Pri and Pre of Na elements. 


First step: 

for 7 from 1 to Na do 
Let a —R [0, N — 1] 
Let Img|?] — F(a) 
Let Pri[] — a 
Let Prəļi] — L 

end for 

Sort Img, applying the same permutation on elements of Pr; and Pr2 


Optional step (packing of existing collisions): 
Let i — 1 
while i < Na do 
Let j —i+1 
while Img{i] == Img[j] do 
if Pri{z] Æ Pril[j] then 
if Profi] == L then 
Let Prəļi] — Prilj] 
else 
if Profi] A Pri[j] then 
Output ‘3-Collision (Pr; [i], Pr2fi], Pri[j]) under F’ and Exit 
end if 
end if 
end if 
Let j— j+1 
end while 
Let i — j 
end while 


Second step: 
for i from 1 to Ng do 
Let a — n [0, N — 1] 
Let b — F(a) 
if b is in Img (first occurrence in position j) then 
if Pri[j] 4 a then 
if Pr2[j] == L then 
Let Pr2[j] — a 
else 
if Pro[j] 4 a then 
Output ‘3-Collision (Pri[j], Pra[j], a) under F’ and Exit 
end if 
end if 
end if 
end if 
end for 
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3 A New Algorithm for 3-Collisions 


Now equipped with an analysis of Algorithm[]] we are ready to propose a new al- 
gorithm which offers different time-memory tradeoffs, which are better balanced 
for existing hardware. The basic idea is extremely simple: Instead of initializing 
an array with N° images, we propose to initialize it with N° collisions under 
F. To make this efficient in terms of memory use, each collision in the array 
is generated using a cycle finding algorithm on a (pseudo-)randomly permuted 
copy of F. Since each collision is found in time N!/? the total running time of 
this new first step is N1/2+@, 

The second step is left unchanged, we simply create NÊ images of random 
points until we hit one of the known collisions. Note that, thanks to the new 
first phase, it now suffices to land once on a known point to succeed. As a 
consequence, we can replace condition [J] by the weaker condition: 


a+ß=1. (2) 


Since the running time of the first step is N!/2+®, it would not make sense 
to have 8 < 1/2+ a. Thus, we also enforce the condition a < 1/4. Under 
this condition, the new algorithm runs in time O(N!~®) using O(N®) bits of 
memory. In particular, we can find 3-collisions in time O(N*/4) using O(N‘/*) 
bits of memory. This is a notable improvement over Algorithm [J which requires 
O(N'/?) bits of memory to achieve the same running time. 


Note on the creation of the N® initial collisions. One question that frequently 
arises when this algorithm is presented is: “Why is it necessary to randomize F 
with a pseudo-random permutation ?” 

Behind this question is the idea that changing the starting point of the cycle 
finding algorithm should suffice to obtain random collisions. However, this is not 
true. Indeed, the analysis of random mapping (for example, see [4]) shows that 
on average a constant fraction of points belong to a so-called “giant tree”. By 
definition, each starting point in the giant tree enters the main cycle in the same 
place. As a consequence, without randomization of F the corresponding collision 
would be generated over and over again and the 3-collision algorithm would not 
work. 


4 Detailed Complexity Analysis of Algorithms [] and B] 


In this section, we analyze in more details the complexity and success probability 
of algorithms [fand B] assuming that F is a random mapping. This detailed anal- 
ysis particularly focuses on the following problematic issues which were initially 
neglected: 


1. Among the Na candidates stored in Img and its companion arrays, which 
fraction can non-trivially be completed into a 3-collision? 

2. Inthesecond step, when a value F (a) hits the array Img, what is the probability 
of obtaining a real 3-collision and not simply replaying a known value of a? 
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Algorithm 2. Improved 3-collision finding algorithm 


Require: Oracle access to F operating on [0, N — 1] 


Require: Family of pseudo-random permutation Hx, indexed by K in K 


Require: Parameters: a < 8 satisfying condition J] 
Let Na <— [N ~] 
Let Ne — [N°] 
Create arrays Img, Pri and Pre of Na elements. 
First step: 


for 7 from 1 to Na do 
Let K —RK 


Use cycle finding algorithm on F'o IT to produce collision Fog (a) = 


Let Img|?] — F o ITx(a) 
Let Prii] — Hkg (a) 
Let Profi] — IT (b ) 
end for 
Sort Img, applying the same permutation on elements of Pr; and Pra 


Optional step (packing of existing collisions): 
Let i — 1 
while i < Na do 
Let j —i+1 
while Img{i] == Img[j] do 
if Pri{?] Æ Pril[j] then 
if Profi] A Pri [j] then 
Output ‘3-Collision (Pri [i], Pra[i], Pri[j]) under F’ and Exit 
end if 
end if 
Let j— j+1 
end while 
Let i — j 
end while 


Second step: 
for i from 1 to Ng do 
Let a —R [0, N — 1] 
Let b — F(a) 
if b is in Img (first occurrence in position j) then 
if Pri[j] 4 a then 
if Pro{j] == L then 
Let Pr2[j] — a 
else 
if Pro[j] A a then 
Output ‘3-Collision (Pri[j], Pra[j],@) under F’ and Exit 
end if 
end if 
end if 
end if 
end for 


Follx(b) 
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3. Which logarithmic factors are hidden in the O expression ? 
4. In the first step of Algorithm 2] how can we make sure that we never en- 


counter a bad configuration where the cycle finding algorithm runs for longer 
than O(N1/?)? 


To answer the first question, remark that each candidate stored into Img is a 
random point that has at least one preimage for Algorithm [f or at least two 
preimages for Algorithm] According to H, we know that the expected fraction 
of points with exactly k distinct preimages is e~!/k!. As a consequence, if we 
denote by Px the fraction of points with at least k preimages, we find: 


po A 2 and Gate. 
e e 
The expected fraction of elements from Img which can be correctly completed 
into a 3-collision is P3/P, ~ 0.127 for Algorithm JJ and P3/P, ~ 0.304 for 
Algorithm] To compensate the loss, the easiest is to make the stored set larger 
by a factor of 8 in the first case and 3 in the second. 

We now turn to the second question. Of course, at this point, the candidates 
that cannot be correctly completed need to be ignored. Among the original set 
of Na candidates, we now focus on the subset of candidates that can correctly be 
computed and let N/, denote the size of this subset. Since in the second phase we 
are sampling points uniformly at random, the a posteriori probability of having 
chosen one of the two already known preimages is at most 2/k, where k is the 
number of distinct preimages for this point. Since k > 3, the a posteriori prob- 
ability of choosing a new preimage is, at least, 1/3. Similarly, for Algorithm [J 
the a posteriori probability of choosing a preimage distinct from the single orig- 
inally known one is at least 2/3. To offset this loss of probability, Ng should be 
multiplied by a constant factor of 3. 

The logarithmic factors involved in the third question are easy to find, they 
simply come from the sort and binary search steps. Note that when N“-log(N®) 
< NP the sort operation costs less than the second step and can be ignored. 
Moreover, as soon as a < ĝ, this bound is asymptotically achieved when N 
tends to infinity. However, the binary search appears within the second step and 
a real penalty is paid. 

If we are willing to spend some extra memory — blowing up the memory by 
a constant factor —, this cost can be eliminated using hashing techniques. To 
cover the case of N° - log(N®) = NÔ, we need a data structure with constant- 
time insert and lookup operations. One such data structure is “cuckoo hashing”, 
where lookup operations need worst-case constant time, and insert operations 
need expected constant time — as long as less than half of the memory slots are 
used However, for typical applications, the cost of the binary search ought 
to remain small, compared to the cost of evaluating the function F. Thus, in 
practice, we expect only a tiny benefit from using hash tables. 


? Furthermore, delete operations only need worst-case constant time, and recent im- 
provements even enable update operations in worst-case constant time g. 
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The simplest answer to the fourth question is to fix some upper bound on 
the allowed running time of each individual call to the collision through cycle 
finding algorithm. If the running time is exceeded, we abort and restart with a 
fresh permutation Mg. With a time limit of the form A VN and a large enough 
value of A, we make sure that each individual call to the cycle finding algorithm 
runs in time O(N'/?) and the probability of success is a constant close to 1, say 
larger than 2/3. 


5 A Second Algorithm with More Tradeoff Options 


The algorithm presented in section B]only works for memory up to N14. This 
limitation is due to the way the collisions are generated during the first step 
of Algorithm 2] In order to extend the range of possible tradeoffs beyond that 
point, it suffices to find a replacement for this first step. Indeed, the second 
step clearly works with a larger value of a, as long as we keep the relation 
a+ 8 = 1. Of course, since no 3-collision is expected before we have performed 
N?/3 evaluations of F, the best we can hope for is an algorithm with running time 
N?/3, Such an algorithm may succeed if we can precompute a table containing 
N1/8 ordinary collisions. 

In this section, we consider the problem of generating N!/? collisions in time 
bounded by O(N?/%) using at most Õ(N 1/3) bits of memory. Surprisingly, a 
simple method inspired from Hellman’s time-memory tradeoff [5] is able to solve 
this problem. More generally, for a < 1/3, this method allows us to compute N° 
collisions in time less than O(N!~°) using at most O(N“) bits of memory. The 
idea is to first build N° chains of length N7; each chain starts from a random 
point and is computed by repeatedly applying F up to the N7-th iteration. 
The end-point of each chain is stored together with its corresponding start- 
point. Once the chains have been build, we sort them by end-point values. Then, 
restarting from N° new random points, we once again compute chains of length 
N7, the difference is that we now test after each evaluation of F whether the 
current value is one of the known end-points. In that case, we know that the chain 
we are currently computing has merged with one chain from the precomputation 
step. Such a merge usually corresponds to a collision, the only exception occurs 
when the start-point of the current chain already belongs to a precomputed 
chain (a “Robin Hood” using the terminology of [27]). Then, backtracking to the 
beginning of both chains, we can easily construct the corresponding collision. A 
pseudo-code description of this alternative first step is given as Algorithm [B] 

Note that, instead of building two sets of chains, it is also possible to build a 
single set and look for previously known end-points. This alternative approach is 
a bit trickier to implement but uses fewer evaluations of F. However, the overall 
cost of the algorithm remains within the same order. 

Clearly, since each of the two sets of chains we are constructing contain N°*7 
points, the expected number of collisions is O(N?°*+?7—'),. Remembering that 
we wish to construct N° collisions, we need to let y = (1 — a)/2. The running 
time necessary to compute these collisions is N¢+7 = NU+)/2, Note that, since 
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Algorithm 3. Alternative method for constructing N° collisions 


Require: Oracle access to F operating on [0, N — 1] 
Require: Parameter: a < 1/3 
Let y — (1 — a)/2 
Let Na — [N°] 
Let Ny — [N7] 
Create arrays Start and End of Na elements. 
Create arrays Img, Pri and Pre of Na elements. 


Construction of first set: 
for i from 1 to Na do 
Let a — n [0, N — 1] 
Let Start[i] — a 
for i from 1 to N, do 
Let a — F(a) 
end for 
Let End{i] — a 
end for 
Sort End, applying the same permutation on elements of Start 


Construction of second set and collisions: 
Let t — 1 
while t < Na do 
Let a — nr [0, N — 1] 
Let b — a 
for j from 1 to Ny do 
Let b —— F(b) 
if b is in End (first occurrence in position k) then 
Let a’ — Start[k] 
for | from 1 to Ny — j do 
Let a’ — F(a’) 
end for 
if a 4a’ then 
{Checks that a genuine merge between chains exists} 
Let b —— F(a) 
Let b — F(a’) 
while b 4 b do 
Let a — b 
Let a’ — b' 
Let b —— F(a) 
Let b — F(a’) 
end while 
Let Img[t] — b 
Let Pri [t] — a 
Let Proft] — a’ 
Let t — t+1 
end if 
Exit Loop on 7 
end if 
end for 
end while 
Return arrays Img, Pri and Prez containing Na collisions. 
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a < 1/3, we have (1 + a)/2 < 1 — a. As a consequence, the running time of the 
complete algorithm is dominated by the running time N° = N!~° of the second 
step. 


6 Parallelizable 3-Collision Search 


Since the computation involved during a search for 3-collisions is massive, it 
is essential to study the possibility of parallelizing such a search. For ordinary 
collisions, parallelization is studied in details in 27] using ideas introduced in 
SITIOS Eo . 

We first remark that the algorithms we have studied up to this point are badly 
suited to parallelization. Their main problem is that a large amount of memory 
needs to be replicated on every processor which is very impractical, especially 
when we want to use a large amount of low-end processors. We now propose an 
algorithm specifically suited to parallelization. For simplicity of exposition, we 
first assume that N, ~ N13 processors are available and aim at a running time 
O(N 1/3), Moreover, we would like each processor to use only a constant amount 
of memory. However, we assume that every processor can efficiently communicate 
with every other processor, as long as the amount of transmitted data remains 
small. It would be easy to adapt the approach to a network of small processors, 
with each processor connected to a central computer possessing Õ( N13) bits of 
memory. 

As for ordinary collisions, the key idea is to use distinguished points. By def- 
inition, a set of distinguished points is a set of points together with an efficient 
procedure for deciding membership. For example, the set of elements in [0, M—1] 
can be used as a set of distinguished points since membership can be tested us- 
ing a single comparison. Moreover, with this choice, the fraction of distinguished 
points among the whole set is simply M/N. Here, since we wish to have chains 
of average length N'/°, we choose for M an integer near N?/3. 

The distinguished point algorithm works in two steps. During the first step, 
each processor starts from a random start-point s and iteratively applies F 
until a distinguished point d is encountered. It then transmits a triple (s, d, L), 
where L is the length of the path from s to d, to the processor whose number 
is d (mod N,). We abort any processor if it doesn’t find a distinguished point 
within a reasonable amount of time, for example, following what does for 
2-collisions, we may abort after 20 N/M steps. Once all the paths have been 
computed, we start the second step. Each processor looks at the triples it now 
holds. If a given value of d appears three or more times, the processor recomputes 
the corresponding chains, using the known length information to synchronize 
the chains. If three of the chains merge at a common position, a 3-collision is 
obtained. 

Of course, even with less than N!/3 processors, it is possible to do a partial 
parallelization. More precisely, given N°’ processors with 0 < 1/3, it is possible 
to find 3-collisions in time O(N?/3~®). In that case, each processor needs a local 
memory of size O(N'/3~®) to store all the triples it owns. 
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Algorithm 4. Parallelizable 3-collisions using distinguished points 
Require: Oracle access to F operating on [0, N — 1] 

Require: Number of processors Np < Nv 

Require: Identity of current processor: Id € [0, Np — 1] 


Let M — [N 2/ j {M defines distinguished points} 
Let Lmax = 20 |.N*/*| 


Construction of triples: 

Let s — pr [0, N — 1]; a s; L — 0 

while L < Lmax do 
Let a — F(a); L — L +1 
if a < M then 

Send triple T +— (s,a, L) to processor a (mod Np) and Exit Loop 

end if 

end while 


Acquisition of triples: 
Store received triples (s, d, L) in local arrays A, D, £ numbered from 1 to K 
Sort D, applying the same permutation on elements of A and £ 


Processing of triples: 
Let i — 1 
while i < K do 
Let j — i +1 
while j < K and D{j] = Dji] do 
Let j — j +1 
end while 
if j > i +3 then 
Let L — max(L[i],--- ,L[j — 1]) 
for £ from L downto 0 do 
for k from i to j — 1 do 
if Lik] > £ then 
Let D[k] — Afk]; A[k] —— F(A[A])) 
{D[k] overwritten to keep previous value of A[k]} 
end if 
end for 
Check for 3 equal values in A[i---j — 1] with differing values of D 
If found, Output the 3-collision and Exit 
end for 
end if 
Let i — j 
end while 


7 Extension to r-Collisions, for r > 3 


For r-collisions, recall that we need to evaluate F on approximately r!!/" N@—)/r 
points before hoping for a collision. When considering that r is a fixed value, 
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r!!/" is a constant and vanishes within the O notation. With this new context, 
Algorithm Plis quite easy to generalize. Here, the important parameter is to cre- 
ate shorter chains and compute more of them. The reason for shorter chains is 
that (as in Hellman’s Algorithm [5]), we need to make sure that there are not too 
many collisions between one chain and all the others. Otherwise, the algorithm 
spends too much time recomputing the same evaluations of the random map, 
which is clearly a bad idea. To avoid this, we construct chains which are short 
enough to make sure that the average number of (initial) collisions between an 
individual chain and all the other chains is a constant. Since the total number 
of elements in all the other chains is essentially N‘"—))/", the length of chains 
should remain below N!/". 

To achieve maximal parallelization when searching for an r-collision, Np =~ 
N(r-2)/" processors are required. The integer M that defines distinguished points 
should be near N“~)/", Each processor first builds a chain of average length 
N1/" (as before we abort after 20 N/M steps), described by a triple (s, d, L). 
Each chain is sent to the processor whose number is d (mod N,). During the 
second step, any processor that holds a value of d that appears in r or more triples 
recomputes the corresponding chains. If r chains merge at the same position, a 
r-collision is obtained. 

Given N°? processors with 0 < (r — 2)/r, it is possible to find r-collisions in 
time O(N(‘"-))/"~®). In that case, each processor needs a local memory of size 
O(NC-2/7-0), 

With a single processor, the required amount of memory is O(N ’~?)/"), Thus, 
as r grows, the advantage of the single processor approach on the folklore algo- 
rithm (which requires O(N“’-)/") memory) becomes smaller and smaller. As a 
consequence, for larger values of r, it is essential to rely on parallelization. 


8 Conclusion 


In this paper, we revisited the problem of constructing multicollisions on random 
mappings and showed that it can be done using less memory than required 
by the folklore algorithm. For 3-collisions, the sequential running remains at 
O(N?/3) but the amount of memory can be reduced from O(N?/*) to O(N"). 
A remaining open problem is to determine whether this amount of memory can 
further be reduced. 

Furthermore, finding 3-collisions can be very efficiently parallelized. Given 
N'/3 parallel processors, each equipped with constant memory, the problem 
can be solved in time O(N1/3). More generally for r > 3, we show how to 
generate r-collisions on N’ processors, each with local memory O(N‘"~?)/"~9), 
in time O(N("-)/"-®), It is interesting to note that the cost of the parallelizable 
approach in the full-cost model decreases as 0 grows. 


3 Of course, once a collision occurs, all the values that follow are colliding. However, 
we do not count these follow-up collisions. 
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A Practical Implementation 


Since we only performed a heuristic analysis of our algorithms, in order to show 
that they are really effective, we decided to illustrate our 3-collision techniques 
with a practical example. For this purpose, we construct a random function by 
Xoring two copies of the DES algorithm (with two different keys). More precisely, 
we let: 

F(x) = DESxK, (x) © DESx, (x), 


wherd] Kı = (3322110077665544)i¢ and Ky = (3b2a19087 f6e5d4c)1¢. Since x 
is on 64 bits, the time and memory requirements of the folklore algorithm are 
around 243. Where current computers are concerned, performing 24? operations 
is easily feasible. However, storing 24° values of x requires 24° bytes, i.e. 64 Ter- 
abytes. As a consequence, finding 3-collisions on F with the basic parameters 
of the folklore algorithm is probably beyond feasibility. Using a different time- 
memory trade-off, restricting the storage to 23? values would raise the time 
requirement to 2/8 operations. This is within the range of currently accessible 
computations. However, since the algorithm is not parallelizable, it would require 
a high-end computer. 


4 This keys might seem weird, but they should not have any special proper- 
ties. In truth, we intended to choose Kı = (0011223344556677)16 and K2 = 
(08192a3b4c5d6e7f)16, i.e., (8899aabbccddeef f)ig with high bits stripped. Unfor- 
tunately, the first-named author made a classical endianness mistake while imple- 
menting the algorithm. 
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With the new algorithms presented in this paper, it becomes possible to com- 
pute triple collisions much more efficiently on the function F. For our implemen- 
tation, we chose M = 2“4 to define the distinguished points, which yielded chains 
of expected length 27°. The abort length was set at 8 times the expected length, 
rather than the factor 20 given in Algorithm fl] For computing the chains, we 
used a mix of 32 Intel Xeon processors at 2.8 GHz and 8 Nvidia CUDA cards 
(Tesla type). We collected a total of 35447322 chains and obtained 3078 699 
groups of three or more chains yielding the same distinguished endpoints. The 
largest group contained 36 chains, which shows that it would have been prefer- 
able to use slightly shorter chains. On processors only, this first phase would 
have taken about 94 CPU-days to run. On a single CUDA card, it would have 
taken 11.5 days. 

For simplicity of implementation, the second phase of the algorithm was only 
performed on Intel processors and not on CUDA cards. It took less than 18 
CPU-days to test all groups and it yielded the following triple-collisions: 


F'(d332b9bade5a7d4e) = F(51b8095db532afcc) = F(b084dc15dce042ab), 
F(ca76f f906d6587cf) = F(e1f'7f59a5757d01b) = F(0285 f58147€863c2), 
F(c3783e f30c8bcc3d) = F'(65f14d412 fd91173) = F'(1042d827e5078000). 


We would like to thank CEA/ DAMA for kindly providing the necessary com- 
puting time on its Tesla servers. 


B Applications 


B.1 Collisions for the Hash Function AURORA-512 


AURORA is a family of cryptographic hash functions submitted to the NIST 
SHA-3 hash function competition [8]. Like the other members of the AURORA 
family, AURORA-512 employs different internal compression functions, each 
mapping a 256-bit chaining value and a 512-bit message block to generate a new 
256-bit chaining value. AURORA-512 is the high-end member of that family, 
maintaining an internal state of 512 bit. As required by the NIST, the authors 
of AURORA-512 explicitly claim “collision resistance of approximately 512/2 
bits” for AURORA-512. In other words, collision attacks must not significantly 
improve over the generic birthday attack, which takes roughly the time of 2?°° 
hash operations. 

Internally, AURORA-512 works almost like the cascade of two iterated hash 
functions, except for one important extra operation: 


MF : {0,1}" x {0,1}" — {0,1}" x {0,1}”. 


See Algorithm J for a simplified description of AURORA-512. 
Every eighth iteration, MF is called to mix the two half-states. This seems 
to defend against the cascade-attack from P]: Between two MF-operations, one 


5 Commissariat à l’énergie atomique, Direction des applications militaires. 
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Algorithm 5. AURORA-512: Hashing 8 message blocks. 
Require: Input Chaining Values (Left, Right) € ({0, 1}°°°)? 
for i from 0 to 7 do 
Left —— Compress(Left, Message_Block(‘)) 
Right — Compress(Right, Message_Block(?)) 
end for 
(Left, Right) — MF(Left, Right) 


can generate local collisions in each iteration in one of either the left string, or 
the right string. Thus, the adversary can get a local 2°-collision. But to apply 
the attack from [9], one would rather need a 21?8-collision, so the attack fails. 

Assume, for a moment, that the adversary has generated a 2’-collision on Left 
in the first 7 iterations of the loop. For the right string, we have 2” different values 
Right,, Righta, ..., Right 2s. If two of them collide, a collision for AURORA-512 
has been found. For a fixed Message_Block(7), the chance of a collision, i.e. of 
j#K with 


Compress(Right;, Message_Block(7)) 


Compress(Right,, Message_Block(7)) 


is about 27 - (27 — 1) - 271/2256, By trying out 275°-(6+7) different values for 
Message_Block(7), we expect to find a collision. Note that this means to make 
27 calls to the function Compress. Hence, this attack takes the time of about 
2256—(6+7)+7 — 2250 compression function calls, plus the time to generate the 
27-collision at the beginning. This is essentially the memoryless variant of the 
attack from B], except that the authors of actually generate a 2°-collision 
on Left, by exploiting the previous eight-tuple of message blocks. The attack is 
memoryless, since the adversary only needs to generate 2-collisions on Left, and 
the claimed time is 2749. 

In B], Ferguson and Lucks further propose an attack which uses local r- 
collision, instead of local 2-collisions. A similar attack has been proposed inde- 
pendently PI]. Using eight local r-collisions allows to speed-up the attack to 
roughly 275°/r? compression function calls (plus the time to generate the re- 
quired r-collisions). B] suggest r = 9 (beyond that, computing the r-collisions 
becomes too costly) and claim time 27%*°, including the time to generate ten lo- 
cal 9-collisions. The price for the speed-up is utilizing a huge amount of memory, 
however. 

Our memory-efficient 3-collision allows a different time-memory tradeoff. The 
time is 2756/37 ~ 245, Recall N = 275, and set œ := 1/16, 8 := 15/16 in 
Algorithm Ø] In that case one local 3-collision requires time 274°, which we 
neglect. The memory requirements are down to 216, i.e., almost negligible. 

It is also possible to use more general r-collisions to further improve this 
attack. For example, we can use 4-collisions obtained using the algorithm of 
section [7] To simplify the comparison with previous attacks, we assume a single 
processor, i.e. set 6 = 0, however, with more processors, we would obtain an 
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even better attack. With this choice, a 4-collision on 256-bits is obtained in time 
2192 using a memory of size 2!2°. The corresponding speedup is 4”. Similarly, 8- 
collisions on 256 bits are obtained in time 2?74 each, using 21°? units of memory. 
The speed-up is 8’. Other trade-offs are possible. 

The results on collision attacks for AURORA-512 can be summarised as 
follows: 


8 
2 
3 2249 216 |(this paper) 
4 9242 2128 |(this paper) 
8 273p 2192 |(this paper) 


B.2 Attacks on Other Hash Functions 


Several attacks on several other SHA-3 candidates make heavy use of multicol- 
lisions, and it appears a natural idea to plug in our algorithms for reducing the 
memory consumption of these attacks. We actually tried to do so, but only suc- 
ceeded for Aurora-512. In the current section, we will explain why we failed for 
other obvious candidates. 

Several attacks, such as the attacks on Blender and on Twister [0], 
employ multicollisions, but it turns out that these can actually be generated by 
Joux-style iterated 2-collisions, which is very memory-efficient — and also faster 
than our general multicollision algorithms, anyway. 

An obvious candidate to employ our algorithms to improve given cryptanalytic 
attacks is a preimage attack on JH-512 [TQ]. Like Aurora, JH is a family of hash 
functions submitted to the SHA-3 competition. The high-end 512-bit variant 
is denoted as JH-512. Internally, JH-512 is a wide-pipe hash function with an 
internal state of 1024 bit, and it employs an invertible compression function. 
propose a meet-in-the-middle attack which requires “2510-3 compression function 
evaluations and a similar amount of memory” (our emphasis). The authors of 
[O] stress: “We do not claim that our attack breaks JH-512 (due to the high 
memory requirements).” The author of JH-512 provides a more detailed analysis 
of this attack, claiming “2510-6 [units of] memory”. A main phase of the attack 
is generating several 51-collisions on one half of the chaining values (i.e., on 
512 bits). By applying our algorithms to this task, it is possible to reduce the 
memory required for this phase to 2!2/5))49 units of memory. 

But another phase of the attack from [I] is to apply the inverse of the com- 
pression function to generate 250° internal target values. The attack successfully 
generates a message which hashes to a given preimage, if the first part of the 
message hashes to any of these 250° target values. Finally, the overall amount 
of storage for the attack is dominated by storing these 2509 values, regardless of 
improving memory-efficiency of the multicollision phase. 
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Abstract. The design of cryptographic hash functions is a very complex 
and failure-prone process. For this reason, this paper puts forward a 
completely modular and fault-tolerant approach to the construction of a 
full-fledged hash function from an underlying simpler hash function H 
and a further primitive F (such as a block cipher), with the property 
that collision resistance of the construction only relies on H, whereas 
indifferentiability from a random oracle follows from F being ideal. In 
particular, the failure of one of the two components must not affect the 
security property implied by the other component. 

The Mix-Compress-Miz (MCM) approach by Ristenpart and Shrimp- 
ton (ASIACRYPT 2007) envelops the hash function H between two in- 
jective mixing steps, and can be interpreted as a first attempt at such a 
design. However, the proposed instantiation of the mixing steps, based 
on block ciphers, makes the resulting hash function impractical: First, it 
cannot be evaluated online, and second, it produces larger hash values 
than H, while only inheriting the collision-resistance guarantees for the 
shorter output. Additionally, it relies on a trapdoor one-way permutation, 
which seriously compromises the use of the resulting hash function for 
random oracle instantiation in certain scenarios. 

This paper presents the first efficient modular hash function with 
online evaluation and short output length. The core of our approach 
are novel block-cipher based designs for the mixing steps of the MCM 
approach which rely on significantly weaker assumptions: The first mix- 
ing step is realized without any computational assumptions (besides the 
underlying cipher being ideal), whereas the second mixing step only re- 
quires a one-way permutation without a trapdoor, which we prove to be 
the minimal assumption for the construction of injective random oracles. 


1 Introduction 


MULTI-PROPERTY HASH FUNCTIONS. Cryptographic hash functions play a cen- 
tral role in efficient schemes for several cryptographic tasks, such as message au- 
thentication, public-key encryption, digital signatures, key derivation, and many 
others. Yet the huge variety of contexts in which hash functions are deployed makes 
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the security requirements on them very diverse: While some schemes only assume 
relatively simple properties such as one-wayness or different forms of collision re- 
sistance, other schemes, including practical ones such as OAEP and PSS [8], 
are only proven secure under the assumption that the underlying hash function is 
a random oracle [B], i.e., a truly random function which can be evaluated by the 
adversary. On the one hand, while a number of provably-secure collision-resistant 
hash functions, such as VSH [p] or SWIFFT [18), have been designed, they are 
not appropriate candidates for random oracle instantiation. On the other hand, 
well-known theoretical limitations only permit constructions of hash func- 
tions for random oracle instantiation from idealized primitives [IQ], such as a fired- 
input-length random oracle or an ideal cipher[] but (as first pointed out in B]) these 
constructions may lose any security guarantees as soon as the adversary gets to ex- 
ploit non-ideal properties of the underlying primitive f 

While one could in principle always employ a suitable hash function tailored at 
the individual security property needed by one particular cryptographic scheme 
at hand, common practices such as code re-use and the development of standards 
call for the design of a single hash function satisfying as many properties as 
possible. This point of view has also been adopted by NIST’s on-going SHA-3 
competition [[7], and motivated a series of works shifting the design problem 
of multi-property hash functions to the task of constructing good multi-property 
compression functions. A further line of research has been devoted to robust 
multi-property combiners [I3], which merge two hash functions such that the 
resulting function satisfies each of the properties possessed by at least one of 
the two starting functions. While these works simplify the design task, building 
multi-property hash functions from single-property primitives remains far from 
being simple, and is the main topic of this paper. 


STATEMENT OF THE MAIN PROBLEM. This paper presents a modular design 
for hash functions that are collision resistant in the standard model and can, 
simultaneously, be used for random oracle instantiation in the ideal model. We 
consider a setting where both a hash function H as well as some other (po- 
tentially ideal) primitive F (such as a block cipher) are given (a similar setup 
was previously considered by Ristenpart and Shrimpton [23]): We aim at de- 
vising a construction C™®* which is collision resistant as long as H is collision 
resistant H and which behaves as a random oracle (with respect to the notion of 
indifferentiability [[9L0)) whenever F is ideal. For this approach to be practi- 
cally appealing, the construction must preserve the good properties of H: For 
instance, it must allow for online processing of data (which is crucial for large 


1 An ideal cipher E : {0,1}* x {0,1}” — {0,1}” associates an (invertible) random 
permutation E(k, -) with each key k. 

? Of course, a real block cipher cannot be ideal. (Likewise, a hash function cannot be a 
random oracle either.) Yet modeling it as ideal captures the adversary’s inability of 
exploiting any structure, and a security proof in this model implies in particular the 
inexistence of any generic attacks treating the block cipher as a black box. 

3 In particular, we require the existence of a standard-model reduction. 
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inputs or in streaming applications) whenever H can be evaluated online Also, 
the construction should not increase the size of the hashes of H. 

In particular, we advocate a safe and modular design paradigm where each of 
both properties should ideally rely only on one of both component primitives, 
whereas the other primitive may be arbitrarily insecure, except for (possibly) 
satisfying some minimal structural requirement (that can be ensured by design), 
such as F being a permutation or H being sufficiently regular. This differs from 
the point of view taken in B3], where H is guaranteed to be collision resistant 
and is extended by means of an ideal primitive F into a random oracle, while 
preserving the collision-resistance guarantees of H: We believe that practical 
considerations, especially efficiency, may in fact motivate the use of hash func- 
tions with no provable security guarantees. Thus, it is desirable that even the 
ability of finding collisions for H does not impact the indifferentiability of the 
construction, as long as F is still ideal. Either way, both points of view are re- 
lated: Any solution satisfying our stronger requirements (including the one we 
propose in this paper) also fits within the framework of [23], while the solution 
proposed in [23] also satisfies stronger requirements, as discussed below. 

We also remark that using the multi-property combiner of [A] one can com- 
bine a random oracle (built from F) and H into a hash function that provably 
observes both properties. However, as combiners inherently do not exploit the 
knowledge of which one of both functions has a certain property, the resulting 
construction is rather inefficient, e.g., it doubles the output length. 


THE MCM APPROACH. Given a hash function H as above, the so-called miz- 
compress-miz. (MCM) approach, introduced by Ristenpart and Shrimpton B3], 
considers the construction 


MCM MoH (x) = Mo(H(My(2))), 


where Mı and Mə are arbitrary-input-length injective maps (the so-called mixing 
stages) with stretch 7, and T2, respectively, i.e., such that M; outputs a string 
of length |z| + 7; on input x € {0,1}*. The injectivity of the mixing stages 
ensures that MCM preserves the collision resistance of H in the standard model. 
Additionally, it was shown in [23] that MCM is indifferentiable from a random 
oracle if Mı and Mə are random injective oracles (i.e., M; returns a random 
(£| + 7;)-bit string for each input x € {0,1}* that differs from all previously 
returned values with the same length) and H is collision resistant and sufficiently 
regular. Dodis et al. subsequently interpreted this result as the combination 
of two facts: (i) The mapping x + H(Mj(2)) is preimage award under the same 


4 Most hash functions rely on some iterated (and thus inherently online) design, such 
as Merkle-Damgard [OE]], or sponges [f]. 

5 Informally, a construction CF based on an ideal primitive F is preimage aware if 
there exists an algorithm — called the preimage extractor — which given the input- 
output history of F and an output y, either aborts or returns x such that CF (x1) =y, 
and after such query no adversary can find an input 2’ such that C” (x') = y (and 
x’ Æ x in case the extraction query did not abort). 
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assumptions, and (ii) Post-processing the output of a preimage-aware function 
with a (possibly injective) random oracle yields a full-fledged random oracle. A 
concrete instantiation of injective random oracles — called the TE-construction 
— relying on an ideal cipher and a trapdoor one-way permutation has also been 
proposed in [23]: To date, this was the only known such construction. 

Interestingly, we observe that the MCM approach provides a modular design 
approach for hash functions as advocated above, since the indifferentiability 
result can be made independent of the collision resistance of H. (This was unno- 
ticed in [23], and is briefly discussed in the full version of this paper.) However, 
its deployment is subject to a number of practical and theoretical drawbacks, 
whose solution was stated as an open problem in [23]: First, every construction 
of injective random oracles (and in particular the TE-construction) cannot be 
online, as, roughly speaking, each output bit needs to be influenced by all of 
the input in order to exhibit random behavior. Additionally, the fact that the 
TE-construction is length-increasing has a serious impact on the resulting hash 
size: In particular, the stretch 7; typically equals the bit length of a sufficiently 
secure RSA modulus, i.e., 7; > 2048 bits for reasonable security. Finally, the use 
of a trapdoor one-way permutation within the TE-construction is rather unde- 
sirable: In contrast to (non trapdoor) one-way permutations, the assumption is 
very strong, e.g., it implies public-key encryption in the random oracle model [4]. 
Also, as pointed out in 23], the compositional guarantees of protocols using the 
MCM approach (with the TE-construction) to instantiate a random oracle are 
affected, as properties such as deniability may be lost (cf. e.g. the works by 
Pass and by Canetti et al. []). 

These observations give rise to a number of challenging open questions. Can we 
instantiate the first mixing stage of MCM with a weaker primitive which allows 
for online processing? Can we instantiate the second mixing stage (where online 
processing is not an issue) as an injective RO with limited stretch (possibly even 
with no stretch at all)? And finally, can we weaken the underlying assumption, 
eliminating the need of the trapdoor, or possibly even entirely removing the 
underlying assumption? 


CONTRIBUTIONS AND ROADMAP OF THIS PAPER. In this paper, we present 
the first efficient modular construction of a hash function in the sense described 
above. Our solution relies on the MCM approach, and in particular we address 
and solve all of the aforementioned open questions, and hence make a substantial 
step towards making the MCM approach practical. 


First Mixing Stage. In Section B] we present a novel mode of operation for a 
block cipher E : {0,1}?” x {0,1}" — {0,1}” implementing an arbitrary-input- 
length injective map — called iterated miz (IM) — that permits online processing 
of its inputs, making only one call to E per n-bit message block, and has only 
stretch n/2. Our first main theorem shows that the construction IMC" (M) := 
H(IM”(M)) applying H to the output of IM is preimage aware if E is an ideal 
cipher and, additionally, the hash function H satisfies a rather weak regularity 
requirement (which is somewhat incomparable to the one used in [23], albeit 
equally natural): Namely, given a random n-bit string m and some arbitrary 
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string S, the value H(S||m) has (min-)entropy not much lower than n (if n is 
smaller than H’s hash size), or not much lower than the hash size otherwise. In 
fact, even completely insecure hash functions can have this property, and it is 
also natural to assume that it is satisfied by any reasonably built hash function. 
We also present a variant of the IM-construction which requires a block cipher 
with single-block key length n at the price of making two block-cipher calls per 
message block. 

We stress that (contrary to the TE-construction) our result does not rely 
on any computational assumptions: In particular, the |M-construction relies on 
invertible primitives, and is itself efficiently invertible. Thus, IM does not imple- 
ment a random injective oracle. 


Second Mixing Stage. With the goal of making the MCM approach preserve the 
hash size of the underlying hash function in mind, the second part of this paper 
(Section) addresses the question of building length-preserving injective random 
oracles. (We call this a (non-invertible) random permutation oracle (RPO).) 
We show that for any three permutations E, E’,7 from n bits to n bits, the 
permutation 


NIRP: Z T(x) := E"(n(E(z))) 


is indifferentiable from a RPO if both E and E” are (fixed-key) ideal ciphers, 
and 7 is a one-way permutation, without a trapdoor. 

In practice, Æ, E’ are instantiated by a block cipher with two distinct fixed 
keys. This limits us to n being a valid block size (e.g. n = 128 bits), which 
can be smaller than the usual hash size (e.g. h = 256). This motivates the 
question of extending the input/output size of random permutation oracles: In 
Section [2] we present constructions (which are reminiscent of the Shrimpton- 
Stam compression function [25]) for extending every n to n bits RPO into a y-n 
bits to y- n bits RPO for any fixed y > 1. 

In the full version we further show that in order to construct injective ROs the 
assumption of a one-way permutation cannot be weakened to a one-way function 
(at least under black-box security reductions). 


Putting Pieces Together. Finally, instantiating MCM with IM and NIRP (or its 
extension through our extender) as its first and second mixing stage, respectively, 
leads to the first construction of a hash function with the following properties: 


(i) Its collision resistance can be reduced in the standard model to the collision 
resistance of the underlying hash function. 

(ii) It is indifferentiable from a random oracle in the ideal cipher model (with 
a one-way permutation), as long as the underlying hash function is suffi- 
ciently regular. 

(iii) It can be evaluated online as long as the underlying hash function can be 
evaluated in an online fashion. 

(iv) It has hash size equal to the one of the underlying hash function. 

(v) It can be used to instantiate a random oracle in all computationally secure 
schemes in the random oracle model, with no composability limitations. 
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2 Preliminaries 


NOTATIONAL PRELIMINARIES. Throughout this paper, {0,1}" denotes the set 
of strings s of length |s| = n, whereas ({0,1}”")* and ({0,1}”)* are the sets of 
strings consisting of n-bit blocks with and without the empty string, respectively. 
The notation s||s’ stands for the concatenation of the strings s and s’. Also, we 
use INJ(m,n) to denote the set of injective functions f : {0,1} — {0,1}” (in 
particular, INJ(n, n) is the set of permutations from n bits to n bits). Further, it is 
convenient to define BC(«, n) as the set of block ciphers, i.e., of keyed functions E : 
{0,1}" x {0,1}” — {0,1}” such that each key k € {0,1}* defines a permutation 
Ex(-) := E(k,-) € INJ(n,n) (and denote as E~1(k,-) the corresponding inverse). 

Algorithms are in general randomized, and throughout this paper we fix a 
RAM model of computation for these algorithms. We use the notation A°(r) 
to denote the (oracle) algorithm A“) which runs on input r with access to the 
oracle Ø. In particular, an algorithm A“) is said to have running time t (also 
denoted as time( A) = t) if the sum of its description length and the worst-case 
number of steps it takes (counting oracle queries as single steps), taken over 
all randomness values, all inputs and all compatible oracles, is at most t. If 
the algorithm takes inputs of arbitrary length, then time(A, £) refines the above 
notion to only take the maximum over inputs of length at most £. 


Finally, the shorthand x È § stands for the action of drawing a fresh random 


element x uniformly from the set S, whereas x È A0 (r) denotes the process of 
sampling x by letting A interact with O on input r (and probabilities are taken 
over the random coins of A and O). 


ONE-WAY FUNCTIONS AND PERMUTATIONS. We define the one-way advantage 
of an adversary A against a function f : {0,1} — {0,1}” as 


Adv9""(A) = Pla È {0,1}, 2’ È A(f(2)) : f(@) = Fœ). 


For the special case of a permutation m : {0,1}" — {0,1}”, it is convenient to 
use the shorthand Adve"? (A) = Pla £ {0,1}", a’ hl A(a(a)) : x = x'] for the 
one-way permutation advantage. 


IDEALIZED PRIMITIVES. We consider a number of (more or less) standard ide- 
alized primitives throughout this paper, which are always denoted by bold-face 
letters. For a set X, a random oracle (RO) R : X — {0,1}” is a system associ- 
ating a random n-bit string R(x) with each input z. If X = {0,1}, then R is 
called a fixed-input-length RO (FIL-RO), whereas it is a variable-input-length RO 
(VIL-RO) if X = {0,1}*. An ideal cipher (IC) E : {0,1}" x {0,1}” — {0,1}” 
is a block cipher E chosen uniformly from the set BC(«,n), and allows both 
forward queries E(k, x) as well as backward queries E~1(k,y). If x = 0, then 
we omit the first input and we call this a fired-key ideal cipher. Note that for 
an IC E and distinct fixed key values ko, ki,..., E(ko,-), E(k1,:),... are inde- 
pendent fixed-key ICs. In contrast, a (fixed-input-length) random injective oracle 
(FIL-RIO) I : {0,1} — {0,1}” implements a uniformly chosen function from 
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INJ(m, n). In the special case m = n we call this a random permutation oracle 
(RPO) P :: {0,1}" > {0,1}”. 

We stress that the substantial difference between a fixed-key IC and a RPO 
is that the former allows for inversion queries, whereas the latter does not (and 
is in particular hard to invert). 


INDIFFERENTIABILITY. The notion of indifferentiability was introduced by Mau- 
rer et al. to generalize indistinguishability to constructions CF : X — {0,1}” 
using a public (idealized) primitive F (e.g., an IC, a FIL-RO, or a combination 
of these), i.e., that can be accessed by the adversary. Roughly speaking, CF is 
indifferentiable from an ideal primitive F’ if there exists a simulator SF’ access- 
ing F’ such that (CF,F) and (F’, SF’) are indistinguishable. In particular, we 
will be concerned with the cases where F’ is either a RO or a RIO/RPO, and 
we define the RO-indifferentiability advantage of the distinguisher D against the 
construction CF and simulator S as the quantity 


Advésy(D) = |P [DCF =1|—p [DRS = 1] 


Ed 


where R : X — {0,1}” is a RO with the same input and output sets as C. 
The [RO-indifferentiability advantage Advérg° is defined analogously by using 
a RIO I instead of R. We stress that both quantities are related by a simple 
birthday-like argument, i.e., Adv& $? (D) < Advt? (D) + 4-(q+qs)?-2-”, 
where q is the number of query D makes to its first oracle, whereas qs is the 
overall number of queries S makes when answering D’s queries. Note that indif- 
ferentiability ensures composability, i.e., if a cryptographic scheme is secure using 
an ideal primitive F’ accessible by the adversary, then it remains secure when 
replacing F’ with a construction CF which is indifferentiable from F’ and letting 
the adversary access F. See for a formal treatment in the information- 
theoretic and computational models. 


COLLISION-RESISTANCE. Let H : K x {0,1}* — {0,1}" be a (keyed) hash func- 
tion with key generator K. The collision-finding advantage of an adversary A is 


Adv3i(A) := P[k È K,(M,M’) È A(k) : M 4M’ ^ Hy (M) = Hy(M’)] 


The notion naturally extends to keyless hash functions (which can be consid- 
ered in the same spirit proposed in [24]) and to constructions from some ideal 
primitive F (where A is additionally given access to F). 


THE MCM-ConstRUCTION. For a hash function H : {0,1}* — {0,1}", and 
injective maps Mı : {0,1}* — {0,1}*, M2 € INJ(h’,n), where n > h’ > h, the 
MCM-construction implements a map {0,1}* — {0,1}” as 


MCMM* M2 (7) := Mo(H(Mi(M)) || 0*7”). 


We also define MCM} =®M2 := MCM™»-4s-M2 for all k € K if the hash function 
H is keyed (with key space K) . Also, the definition does not allow M1, Mz to be 
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keyed (in contrast to [23]). This is because we will present keyless instantiations 
of Mı, Mə. Note that we assume Mə to be fixed input length without loss of 
generality. The following simple result was shown in 23], and holds both for 
keyed as well as for keyless hash functions. 


Lemma 1. For all collision-finding adversaries A outputting a pair of mes- 
sages each of length at most £, there exists a collision-finding adversary B such 
that Advem nm (A) = Adv'(B), where time(B) = time(A) + O(2(é + 
time(M1, @))). 


PREIMAGE AWARENESS. We briefly review the notion of preimage awareness 
for a hash function HF : {0,1}* — {0, 1}” built from an idealized primitive F. A 
preimage extractor E is a (deterministic) algorithm taking a history a of input- 
output pairs of F and a value y € {0,1}" such that €(a,y) returns a value 
x € {0,1}* U{L}. We consider a random experiment (called the pra-game) 
involving an adversary A which can query both F and €(a,-) (where a is the 
current history containing the interaction with F so far, i.e., the adversary cannot 
change the first argument), and where a set Q contains all E-queries y of A and 
an associative array V stores as V[y] € {0,1}*U{L} (for all y € Q) the answer of 
the query y to €. The pra-advantage of the adversary A with preimage extractor 
E, and primitive F is the quantity 


Adv?" .(A) := P[(M,y) È AECE : y € QAHF(M) = y A Viy] # M]. 


It turns out that preimage aware functions are good domain extenders for FIL- 
ROs: More concretely, with H as above, consider the construction CER’ : M > 
R’(HF(M)) for a FIL-RO R’ : {0,1}” — {0,1}”. Then, the following result was 
proved in [J]. 


Lemma 2 (PRA + FIL-RO = VIL-RO [12]). There exists a simulator S 
such that for all distinguishers D making q queries to CFR’ of length at most £, 
qı queries to F and qz queries to R’, there exists an adversary A with 


Adve? AD) < Advis (A): 


The simulator S runs in time O(qi + q2:time(E)) and makes q2 queries, whereas 
A runs in time time(D) + O(q- time(H, £) + qo + q1) and makes q: que + qı 
F-queries and q2 extraction queries, where que is the maximal number of oracle 
queries made by H to process an input of length at most £. 


3 An On-Line Mixing Stage: The IMC-Construction 


3.1 Description 


THE IM-CONSTRUCTION. The iterated mix construction (or IM-construction for 
short), depicted in Figure [JJ relies on a block cipher E : {0,1}?” x {0,1}" = 
{0,1}" and an injective mapping PAD : {0,1}* — {0,1}"/? x ({0,1}”")* which 
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Fig. 1. The IM-construction with block cipher Æ : {0,1}?" x {0,1}" — {0,1}” 


pads every string so that it consists of one n/2-bit block, followed by as many 
n-bit blocks as necessaryff On input M e€ {0,1}*, it first obtains PAD(M) = 


my||...||me, and computes the output yi||...||ye iteratively such that yı := 
E(IV||mz,0"/?\|m1) (where IV is an n-bit fixed initialization value) and y; := 
E(ys—1||mi41,m;) for alli =1,...,0, where me41 := 0". 


In contrast to the TE-construction of [23], the IM-construction is iterated and 
allows for (essentially) online processing, with the minimal restriction that only 
the first i— 1 output blocks y;,...,y;-1 can be computed from the first i message 
blocks mı, ..., m. This one-block-lookahead evaluation strategy only marginally 
impacts the efficiency of the construction, and is crucial in order to ensure the 
desired security requirements. 


INJECTIVITY OF THE IM-CONSTRUCTION. It is not difficult to see that the con- 
struction is injective: Given an output yi||... ||ye (for some £) we can iteratively 
efficiently reconstruct the padding mj||...||me of the input M by computing 
mi := E~*(y—1||misi, Mmi) for all i = 2,0—1,...,2, with me, = 0", and finally 
0/7 lm := E~!(IV||ma, y1). Thus, IM cannot be a VIL-RIO, and not even one 
way, even though it is surprisingly still strong enough to instantiate the first 
mixing step of the MCM approach, as we show below. 


THE IMC-CONSTRUCTION. It is convenient to define the combination of the IM- 
construction and a hash function H as the iterated mix-compress construction 
(or IMC-construction, for short), which, on input a string M € {0,1}*, outputs 
IMC®”"(M) := H(IM“(M)). If H is keyed, then we similarly define the keyed 
function IMC” (M) := IMC™7* (M). Note that if H can be evaluated online, 
then this is the case for the IMC-construction as well. 


SHORTER KEY SIZE. The use of a block cipher with key length equal twice the 
block length is acceptable in practice[] Still, in oder to ensure compatibility with 
a larger number of block ciphers, we propose an alternative construction (called 
the DM-IM-construction) which relies on a block cipher E : {0,1}”" x {0,1}”" —> 
{0,1}", at the cost of making two calls per processed message block. The un- 
derlying idea consists of producing an n-bit key value at each round by using 


° This can be done in the canonical way by appending the bit 1 followed by as many 
0 bits as necessary in order to fulfill the length requirement. 
T For instance, AES supports key size 256 bits with block length n = 128 bits. 
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the Davies-Meyer construction on y;-; and mj;+41: More precisely, we compute 
yı := E(E(m2,IV) @IV,0"/2\|m1) and y; := E(E(mis1, yi-1) ® yi-1, Mi) for 
all i = 2,...,¢. As above, for a hash function H, we define DM-IMC®” (M) = 
H(DM-IM¥(M)). (And analogously for the keyed case.) 


3.2 Preimage Awareness 


The purpose of this section is to prove that, for an ideal cipher E : {0,1}?" x 
{0,1}" > {0,1}", the construction IMC™” is preimage aware, provided H sat- 
isfies very weak randomness-preserving properties that we discuss first. 


HASH FUNCTION BALANCE. The IMC-construction does not exhibit any useful 
properties if H can be arbitrary (consider e.g. the case where H is constant). It 
is nevertheless reasonable to assume H to satisfy minimal structural properties 
which could be (and generally are) ensured by design. In particular, we require 
H to preserve some of the randomness of a uniformly chosen input m of a 
given length n (where n is e.g. the block length of the cipher used in the IM- 
construction), and this should hold even if m is appended to some other fixed 
input string M. 


Definition 1. An (unkeyed) hash function H : {0,1}* — {0,1}" is (e, n)-prefix- 
balanced if for all messages M € ({0,1}")" and hash function outputs y € 
{0,1}” we have P[m 2 {0,1}": H(M||m) = y] < e. 


The notion extends naturally to a keyed hash function H : {0,1}" x {0,1}* —> 
{0,1}: We say that it is (e€, n)-prefix balanced if for all keys k the function Hp 
is (E(k), n)-prefix balanced, and 7, P(k)-e(k) < €, where P(k) is the probability 
that the key generator samples the key k. We remark that the best € one can 
hope for is € = 27” as long as n < h holds, whereas € > 27” for n > h. Note that 
our notion is somewhat incomparable to the one of [23], where on the one hand 
balancedness under variable input lengths is considered (rather than for some 
fixed length n, as in our case), but, on the other hand, the property is not required 
under prepending of fixed prefixes: Still we find this extension to be natural in 
a hashing scenario. It is important to realize that prefix balancedness does not 
imply any useful security properties for H: The function H : ({0,1}*)* = 
{0,1}” such that H(M||m) := m for all n-bit strings m and all M with length 
multiple of n is (n,2~”)-prefix-balanced, despite finding collisions or preimages 
in this function being trivial. 


MAIN THEOREM. The following theorem is the main result of the first part of 
this paper: It provides a concrete characterization of the security of the IMC- 
construction in the ideal-cipher model. We stress that the result only relies on 
E being an ideal cipher, and H being sufficiently balanced, but no computa- 
tional assumption is made, i.e., the result holds with respect to computationally 
unbounded adversaries. 


Theorem 1 (Preimage Awareness of IMC). Let E : {0,1}?" x {0,1}" = 
{0,1}” be an ideal cipher and let H : {0,1}* — {0,1}” be an (e, n)-prefix-balanced 
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hash function. There exists a preimage extractor E (given in the proof) such that, 
for all adversaries A issuing at most q queries to E and qe queries to E, we have 


Advi cen peA) <3 alq +1) 2) +q. 27/2 + g(q+ 24e) - $. 


Furthermore, E answers an extraction query in time O(|a| - log |a|). 


The result extends naturally to a keyed hash function by just averaging the 
bound over all choices of the key. The security of IMC is bounded by (roughly) 
min{2”/2, ,/e}, and is not worse than the one in the TE-construction (which 
additionally relies on the security of the underlying trapdoor one-way permuta- 
tion). Note that Theorem [J] is concerned with the entire IMC-construction: An 
interesting (and seemingly challenging) open question consists of distilling the 
(minimal) properties needed by IM to yield preimage awareness for IMC. 

The remainder of this section is devoted to the proof outline of Theorem JJ 
Technical details are postponed to the full version, as well as a discussion on 
how to obtain similar bounds for DM-IMC. 


INTERACTION GRAPHS. An interaction with the ideal cipher E can be described 
in terms of the history a, consisting of triples (k, x,y), where k € {0,1}?", and 
x,y € {0,1}". Both a forward query E(k, x) with output y and a backward query 
E~‘(k,y) with output x result in a triple (k, x,y) being added to afl However, 
it is far more convenient to describe a in terms of a directed (edge labeled) graph 
G = G(a) = (V, E) with vertex set V := {0, 1}” and edge set E C V x V such that 
(y, y’) € E with labels label(y, y’) = mand next(y, y’) = m’ if (i) (ym, m,y’) Ea 
with y 4 IV or (ii) (y||m’,0”"/?||m, y’) € aif y = IV. A (directed) path IV = yo > 
yı > ++: > ye in Gis called valid if for alli = 1,...,€—1 we have label(y;, yi41) = 
next(yi_1, yi). It is additionally called complete if next(ye_1, ye) = 0”. The value 
of a complete valid path is defined as H(y;|| ...||yc), and its preimage is the string 
M which is padded to label(yo, y1)|| «+ |[label(ye—1, ye). 


THE PREIMAGE EXTRACTOR €. On input a history a and a (potential) output 
z € {0,1}" of IMC, the preimage extractor € first computes the subgraph G” of 
G(a) induced by the vertices which are reachable through a valid path. If G” is 
not a directed tree, then € aborts and outputs L. Otherwise, if G” contains one 
single valid complete path with value z and preimage M, it outputs M. In any 
other case, it outputs L. 

It is not hard to see that € can be implemented with running time O(|a| - 
log |a|) (i.e., where |a| approximately equals the number of edges in the graph 
G(qa)) due to the fact that € aborts if G” is not a tree: Otherwise, the number 
of possible valid paths may be very high, even exponential f] 


8 The actual history used in the definition of preimage awareness indeed contains more 
information, such as whether the triple is added by a forward or by a backward query, 
but this is irrelevant in the following. 

° One may argue that we are taking a rather conservative approach: Even if the graph 
were not a tree, it would most likely have a limited number of valid paths. Still, this 
considerably simplifies the security analysis with no noticeable loss in the obtained 
bounds. 
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PROOF INTUITION. Assume without loss of generality that the adversary A 
never repeats a query twice] and that whenever it terminates in the pra-game 
outputting a pair (M, z), it has made all queries necessary to evaluate the IMC- 
construction on input M (with output z). In other words, the interaction graph 
G(q) of the final history a contains a valid complete path with preimage M and 
value z. But because the query z was previously issued to €, if A wins the game, 
one of the following has to occur: (i) The subgraph of the valid paths is not a 
directed tree, (ii) No valid path with value z existed when the €-query z was 
issued, but such a path was created afterwards, or (iii) There exist at least two 
valid paths with value z. We show that these events are unlikely. 

A key step is proving that, with very high probability, valid paths are con- 
structed only by means of forward queries: A construction of a valid path by 
backward queries may be successful either because we can “connect” the path 
with an already existing one (built by forward queries), or because we construct 
the entire path backwards. However, both cases turn out to be unlikely: In the 
former case, a fresh backward query outputs a random m (under the permutation 
property), and this can only be the next-label for an already existing edge with 
low probability. (This motivates the one-block-lookahead strategy in IM.) In the 
latter case, it is very unlikely to have all of the first n/2 bits returned by the 
last evaluation query being equal to 0. (This motivates the padding in the first 
block.) However, if a path is generated only by forward queries, we can ensure 
that the value of a valid path is always sufficiently random due to the prefix- 
balancedness of H. We refer the reader to the full version for a formalization of 
this argument. 

This highlights a very intriguing property of the IM-construction: Although 
it can be efficiently inverted on any valid output, it is very unlikely that we can 
come up with such a valid output without first evaluating the construction. (In 
particular, this prevents that even a known collision for H will lead to a valid 
collision for the IMC-construction.) 


4 A Length-Preserving Mixing Stage: Random 
Permutation Oracles 


Post-processing the output of the IMC-construction with a random injective 
oracle yields a full-fledged random oracle (by Theorem [and Lemma), whose 
collision resistance can be reduced to the one of the underlying function H in the 
standard model by Lemma[]] The use of the TE-construction for this task is 
subject to two main drawbacks: It requires a trapdoor one-way permutation and 
also enlarges the output of the compressing stage. (The lack of online evaluation 
capabilities is not a restriction, as we have to process only inputs of fixed length 
equal the output length of the underlying hash function.) In this section, we 
solve both issues. We present a block-cipher based construction of a fixed input- 
length length-preserving RIO, i.e., a (non-invertible) random permutation oracle 


10 Tn particular, if A asks a forward query E(k, x) which is answered by y, the matching 
backward query E~'(k,y) is never issued. (And vice versa.) 
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(RPO), that only relies on a one-way permutation without a trapdoor. In the 
full version, we show that this assumption is somewhat minimal, as RIOs/RPOs 
cannot be built from an ideal primitive and a one-way function. 

Additionally, in order to reduce the dependence between the underlying block- 
and hash sizes, we present domain/range extenders for RPOs. 


4.1 Making Block-Ciphers Non-invertible: The NIRP-Construction 


DESCRIPTION. The NIRP-construction combines a permutation m : {0,1}" > 
{0,1}” and two (fixed-key) ciphers Fy, E2 : {0,1}" — {0,1}” in a “sandwich- 
like” manner. More precisely, for any input m € {0,1}" the NIRP-constructions 
is defined such that NIRP“!-22"(m) := Eo(1(E.(m))). (Also cf. Figure BJ) Ob- 
viously, NIRP“1:£2:" is a permutation. 


SECURITY OF NIRP. We show that the NIRP-construction is indifferentiable 
from a (non-invertible) random permutation oracle if instantiated with two ideal 
single-key] block ciphers E,,E2 and a one-way permutation m (without a trap- 
door). The result is summarized by the following theorem. 


Theorem 2. Let E1, E2 : {0,1}" — {0,1}” be two independent fixed-key ideal 
ciphers and let m : {0,1}" — {0,1}” be a permutation. There exists a simulator 
S (given in the proof) such that for all distinguisher D issuing at most q queries 
to the NIRP-construction, and at most qa, qb, qe, da queries to E1, E] ', E2, E3‘, 
respectively, there exists an owp-adversary A with 


AdviiRpEL £27 s(D) < 2: qe(2q En qa) a F da ° Adv7"?(A). 


The simulator S runs in time O(da + qb + de + da + (24a + Gh + 2qa) - time(7)) 
and makes qa + 2qb + 2dc queries to its oracle, whereas the adversary A runs in 
time time(A) < time(D) + time(S). 


OUTLINE OF THE PROOF. The first part of the indifferentiability proof de- 
scribes the simulator SP that mimics the ideal ciphers E4, E (with their inverses 
E7 ', E} ') given access to a RPO P : {0,1}" — {0,1}". Moreover we use the 
notation SP = (Sp, , SEa, Spt Spez) to make the four sub-oracles of the sim- 
ulator (answering the different query types) explicit. The second part (which is 
postponed to the full version) upper bounds D’s advantage AdvNiRDED E27, s(D) 
in distinguishing the ideal setting (with a simulator) and the real setting. 


THE SIMULATOR. The global state of the simulator SP consists of a table 7 
(which is initially empty) of tuples of the form (a,b,c,d) consistent with evalu- 
ations of the NIRP-construction as in Figure [2] that is, where a,b are simulated 
input-output values of the first cipher E4, i.e., E1 (a) = b (which can be generated 
both by forward queries to E; and by backward queries to ET’) and analogously 


11 Recall that in the ideal cipher model, it is easy to derive two such ciphers from a 
single ideal cipher E : {0, 1}* x {0,1}” — {0, 1}” as E; := E(kı,-) and E2 := E(k2,-) 
for two arbitrary distinct keys kı Æ k2. 
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c, d play the same role for the second block cipher Ez. Furthermore, the invariant 
c = 7(b) and P(a) = d holds. It is also convenient to define A C {0,1}” as the 
set of values a € {0,1}" such that (a,b,c,d) E€ T for some b,c,d. Analogously, 
we define the sets B,C, and D. 

To achieve perfect simulation given oracle access to P, upon a new query to 
one of its four sub-oracles the simulator defines a new tuple (a,b, c, d) in T, with 
the input of the query placed at the appropriate position (as long as no such 
tuple already exists, in which case the corresponding output value is returned), 
and such that all remaining components are set to independent random values 
conditioned on these individual values appearing in no other tuple, on d = P(a), 
and on c = 7(b). This is easily achievable with access to m~t and P™!: For 


example, on input a (to Sg, ), we choose a random b E {0,1}"\ B (i.e., different 
from all b’ appearing in some other tuple), and set c := 7(b) and d := P (a). (This 
is done analogously on input b.) On the other hand, on input c, we compute 
b := m~t! (c), a random a = {0,1}"\ A (i.e., different from all previous a’), and 
then set d := P(a). Finally, on input d, we set a := P~'(d) and subsequently 
generate a random b + {0,1}”"\ B and set c := a(b). 

However, in our setting we have to dispense with m~! and P71. In particular, 
this means that in the latter two cases the simulator cannot set the values b 
and a, respectively, but rather sets these components to a dummy value L, and 
completes these tuples with the actual values if they eventually appear as inputs 
of E or E7! queries. Also note that the simulator must not generate random 
values a and b that collide with a dummy value in order to ensure the permutation 
property. This can be efficiently avoided by simply testing that P(a) 4 d (and 
m(b) # c) for all d’s in tuples of the form (L,b,c,d) (all c’s in tuples of the 
form (a, L,c,d)), and whenever the test fails, we replace the dummy value by 
the actual value, and draw a new a (or b). There are only two remaining cases 
where the simulator fails to answer queries (and aborts): 


(i) A query a is made and a tuple (a, L, c,d) exists: In this case the simulator 
must return m~ !(c), but this requires inverting 7, which is generally not 
feasible. (Call this event Aborty.) 

(ii) A query b is made and a tuple (L, b,c, d) exists: In this case, the simulator 
must return P~*(d), but cannot invert P. (Call this event Aborts.) 


By the above discussion, perfect simulation is achieved until one of these events 
occurs: A game-based argument yields AdViIRPEL Ean, s(D) <  P[Abort;] + 
P[Aborts]. In the full version we give a complete pseudo-code description of the 
simulator and show that the probabilities of both events are very small. 


NIRP = MCM WITH INVERTIBLE MIXING STEPS? Our NIRP-construction 
somehow reflects the MCM design with a permutation, instead of a hash 
function, and this may suggest that the MCM approach works for invertible 
mixing steps as well. Yet, we remark that the proof cannot be adapted to the 
case where the first mixing stage processes inputs of variable input-length: The 
problem is that in the simulation of queries to Ez and E, we need to choose 
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Fig. 2. Left: The NIRP-construction for underlying fixed-key block ciphers £1, E2, with 
(a,b,c, d) corresponding to the notation used in the simulator of Theorem] Right: The 
ESS-construction for underlying permutations P;,..., Ps : {0,1}” — {0,1}”. 


a pair a,P(a) and b,7(b) respectively, and at a later time possibly learn the 
missing dummy values b and c when they are queried. But in order for this to 
succeed, we need the length of a and b to be compatible with the one of such 
later query, which is of course impossible in the variable-input-length case. 


4.2 Extension of Random Permutation Oracles 


The use of the NIRP-construction to post-process the output of a hash function H 
requires a block cipher with block size at least as large as its hash size, i.e., typically 
at least 160 bits. While block ciphers with large block size exist J ciphers such as 
AES support only rather small block lengths, such as 128 bits. This motivates the 
following natural question: Given a RPO P : {0,1}" — {0,1}”, can we devise a 
construction CP : {0,1} — {0,1} for m > n which implements a permutation 
and is indifferentiable from a RPO? Note that this calls for simultaneous domain 
and range extension of P, while we additionally want to ensure injectivity of the 
resulting construction. The problem is similar in spirit to the one considered in the 
private-key setting by Halevi and Rogaway [I6], even though the peculiarities of 
the public setting make constructions far more challenging’ 


THE ESS-CONSTRUCTION. We present a construction — called ESS — for the case 

m = 2n that relies on six permutations P,,..., Ps : {0,1}" — {0,1}” and is remi- 

niscent of the compression function SS???" ; {0,1}2" — {0,1}” by Shrimpton 

and Stam [5] such that SS>? (m; ||m2) := P3(Pi(m1) © P2(mz2)) ® Pı (mı): 

It adds three extra calls (as depicted in Figure 2) to ensure both indifferentia- 

bility of the 2n-bit output, as well as invertibility. It is indeed not hard to verify 

that ESS implements a permutation: Given output y;||y2, the first input-half 
mı is retrieved by computing z := P5*(ye), mı :=z@ P>*(y1), and finally we 
compute mz := P3 (P, (m1) © P3 (Pi (m1) @ P;'(z))). (Of course, the inverses 

Pp are not efficiently computable in general, but they are well-defined.) 

Interestingly, such block ciphers are exactly the ones used within hash functions, e.g., 
to instantiate the Davies-Mayer construction. 

13 Tn particular, each such extender implies the construction of a compression function 
{0,1} — {0,1} for all £ < m from length-preserving random oracles which is indif- 
ferentiable from a random oracle from m bits to £ bits, a problem which has recently 
received much interest (cf. e.g. 2OIZ5}). On top of this, injectivity is an extra design 
challenge. 
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INDIFFERENTIABILITY OF ESS. The following theorem shows that whenever 
the underlying permutations are independent RPOs, the ESS-construction is 
indifferentiable from a RPO up to the birthday barrier. 


Theorem 3. Let P1,...,P6 : {0,1}" — {0,1}”" be independent RPOs. There 
exists a simulator S such that for all distinguishers D making at most q queries 
to the ESS-construction and to each of the underlying RPOs, we have 


Advescrr...r6.g(D) < [+ (4n? +n + 28) + q- (8n+13)] 27". 
The simulator S runs in time O(q?) and makes q queries. 


ARBITRARY EXTENSION. A generalization of ESS- called MD-ESS - to construct 
a RPO {0,1}*" — {0,1}*" for i > 2 using 4 +i independent RPOs from n 
bits to n bits and making 4i + 1 RPO evaluations in total can be obtained as 
follows: Let MD-SS”!"2-"s ; {0,1}? — {0,1}” be the (plain) Merkle-Damgard 


iteration (with no strengthening) that on input M = mzj||...||m; computes 
vj = SSTP (vj 4 \)m,) for j = 1,...,i (with vo being the IV), and outputs 
vi. Then, on input M = m4||...||m; € {0,1}"*, MD-ESS first computes y := 


P,(MD-SS""-?2"s (M1), and finally outputs 


(Pazi (y) 8 m1)|l + l| (Pa+i-1 (y) D ma-1)|| Para(y). 


To verify that MD-ESS implements a permutation, we remark that its output 
uniquely determines y and M1, ..., Mi—1, Whereas m; is determined by the chain- 
ing value vj, and P7 '(y) as in the ESS-construction. Its security is shown in 
the full version. There, we also show that Py41,..., Payi-1 (but not Py,;) can be 
replaced by (invertible) single-key (ideal) ciphers. Also, it can easily be modified 
to support inputs with lengths n’ > n which are not multiples of n. 


5 Conclusions 


In this paper, we have shown the first modular and fault-tolerant hash function 
construction which achieves both collision resistance in the standard model and 
indifferentiability in the ideal model. In particular, this was achieved by building 
appropriate mixing steps IM and NIRP that are compatible with the MCM- 
construction and preserve the practical features of the inner compressing part, 
i.e., the hash function H. By Lemma [JJ the construction MCM/M-#,NIRP (where 
possibly NIRP is replaced by its extension through one of the constructions 
presented in Section 2) inherits the collision resistance of H, as IM and NIRP 
are injective functions. In the ideal setting, we have shown that the combination 
of IM and H is preimage aware as long as H is sufficiently balanced (Theorem[]), 
and that NIRP is indifferentiable from a random permutation oracle (Theorem). 
Thus, by applying Lemma] we conclude that MCM'MLNIRP is indifferentiable 
from a variable-input-length random oracle. 

While the IM-construction is very practical, the implementation of the NIRP- 
construction, despite its efficiency, is conditioned on the existence of a one-way 
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permutation with input length equal the one of existing block ciphers. Indeed, 
sufficiently-secure candidate one-to-one functions exist for similar input param- 
eters (e.g., the discrete logarithm problem in properly chosen elliptic curves of 
prime order q ~ 2” can in general not be solved better than with running time 
roughly O(2”/?), i.e., the security of our constructions), but the fact that the 
block cipher expects n-bit inputs makes their use difficult [4 However, we stress 
that such data-type conversion problems are common in practical constructions. 
For instance, when using an RSA-based trapdoor one-way permutation, the out- 
put of the TE-construction 23] must be (injectively) transformed into a string, 
and the result may be far from being random (attempting to extract random 
bits would destroy the injectivity property). It is our strong belief that these re- 
sults should foster further research in designing good candidates for such central 
cryptographic primitives working at the bit level. 
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Abstract. At Crypto 2005, Coron et al. showed that Merkle-Damgard 
hash function (MDHF) with a fixed input length random oracle is not 
indifferentiable from a random oracle RO due to the extension attack. 
Namely MDHF does not behave like RO. This result implies that there 
exists some cryptosystem secure in the RO model but insecure under 
MDHF. However, this does not imply that no cryptosystem is secure 
under MDHF. This fact motivates us to establish a criteria methodology 
for confirming cryptosystems security under MDHF. 

In this paper, we confirm cryptosystems security by using the following 
approach: 


1. Find a variant, RO, of RO which leaks the information needed to 
realize the extension attack. EN 

2. Prove that MDHF is indifferentiable from RO. 

3. Prove cryptosystems security in the RO model. 


From the indifferentiability framework, a cryptosystem secure in the RO 
model is also secure under MDHF. Thus we concentrate on finding RO, 
which is weaker than RO. 

We propose the Traceable Random Oracle (TRO) which leaks enough 
information to permit the extension attack. By using TRO, we can easily 
confirm the security of OAEP and variants of OAEP. However, there are 
several practical cryptosystems whose security cannot be confirmed by 
TRO (e.g. RSA-KEM). This is because TRO leaks information that is 
irrelevant to the extension attack. Therefore, we propose another RO, 
the Extension Attack Simulatable Random Oracle, ERO, that leaks just 
the information needed for the extension attack. Fortunately, ERO is 
necessary and sufficient to confirm the security of cryptosystems under 
MDHF. This means that the security of any cryptosystem under MDHF 
is equivalent to that under the ERO model. We prove that RSA-KEM is 
secure in the ERO model. 


Keywords: Indifferentiability, Merkle-Damgärd hash function, Variants 
of Random Oracle, Cryptosystems Security. 
1 Introduction 


Indifferentiability Framework. Maurer et al. introduced the indifferen- 
tiable framework as a notion stronger than indistinguishability. This framework 
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deals with the security of two systems C(V) and C(U/): for cryptosystem C, C(V) 
retains at least the same level of provable security of C(U) if primitive V is in- 
differentiable from primitive U, denoted by V CU. This definition will allow us 
to use construction VY instead of U in any cryptosystem C and retain the same 
level of provable security due to the indifferentiability framework of Maurer et 
al. [QJ]. We denote “C(V) is at least as secure as C(U)” by C(V) > C(U). More 
strictly, V C U & C(V) > C(Y) holds. This result implies that if cryptosystem 
C is secure in the Y model and V CU holds, C is secure in the V model, and 
if U Z V holds, there is some cryptosystem that is secure in the U model but 
insecure in the Y model. 


Indifferentiability and the MD Construction. While many cryptosystems 
have been proven to be secure in the random oracle (RO) model B] (e.g. FDH 
BI, OAEPH, RSA-KEM[L]], Prefix-MAC[LJ] and so on), where RO is modeled 
as a monolithic entity (i.e. a black box working in domain {0,1}*), in practice 
most instantiations that use a hash function are usually constructed by iterating 
a fixed input length primitive (e.g. a compression function). There are many 
architectures based on iterated hash functions. The most well-known one is the 
Merkle-Damgard (MD) construction [60]. A hash function with MD construc- 
tion iterates underlying compression function f : {0,1}” x {0,1}! — {0,1}" as 
follows. 


MD! (mı, ...,mz) (mi| = t,i = 1,...,0): 
let yo = IV be some n bit fixed value. 
for i = 1 tol do Yi = f(yi-1, Mi) 
return yı 


There is a significant gap between RO and hash functions, since hash func- 
tions are constructed from a small primitive f while RO is a monolithic random 
function. 

Coron et al. made important observations on the cryptosystems that use 
the indifferentiable framework. They introduced the new iterated hash function 
property of indifferentiability from RO. In this framework, the underlying primi- 
tive, G, is a fixed input length random oracle (denoted here as FILRO or h) or an 
ideal block cipher. We say that hash function H© is indifferentiable from RO if 
there exists simulator S such that no distinguisher can distinguish HŪ from RO 
(S mimics G). The distinguisher can access RO/H® and S/G; S can access RO. 
A hash function that satisfies this property, HC, behaves like RO. Therefore, 
replacing the RO of any cryptosystem by H© does not destroy its security. 

Coron et al. analyzed the indifferentiability from RO for several specific con- 
structions. For example, they have shown that MD” is not indifferentiable from 
RO due to the extension attack which uses the following property: The output 
value z’ = MD"(M||m) can be calculated by c = h(z,m) where z = MD"(M), 
so z’ = c. On the other hand, no ŞS can return the output value z’ = RO(M||m) 
from query (z, m) where z = RO(M), since no S knows z’ from z and m, and z’ is 
chosen at random. Therefore, no S can simulate the extension attack. This result 
implies that MD” does not behave like RO and there exists some cryptosystem 
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that is secure in the RO model but insecure under MD” due to the indifferen- 
tiability framework. Their solution was to propose several constructions such as 
Prefix-Free MD, chop MD, NMAC and HMAC. Hash functions with these con- 
structions are, under h, indifferentiable from RO. It seems impossible to prove 
that the important original MD cryptosystem is secure. 


MD Construction Dead? The MD construction is among the most important 
foundations of modern cryptosystems PISI. There are two main reasons: 


1. MD construction is employed by many popular hash functions such as SHA-1 
and SHA-256, and 

2. MD construction is more efficient than other iterated hash functions such as 
Prefix-Free MD, and chop MD. 


Since MD” ¢ RO holds, there is some cryptosystem C* that is secure in the RO 
model but insecure under MD”. Thus the important question is “can we confirm 
that a given cryptosystem is secure in the RO model and secure under MD"?” 
There might be several cryptosystems that remain secure when RO is replaced 
by MD”. If we can confirm this for many cryptosystems that are widely used, 
the original MD construction remains alive in the indifferentiability framework! 


Our Contribution. Since MD” ¢ RO holds, we modify RO such that MD” is 
indifferentiable from the modified RO. Then we analyze cryptosystems security 
within the modified RO model. Concretely, we adopt the following approach. 


1. Find a variant RO of RO that leaks enough information such that S can 
simulate the extension attack. 

2. Prove that MD” c RO holds. _ 

3. Prove the cryptosystem’s security in the RO model. 


Secure cryptosystems in the RO model are also secure under MD” due to the 
indifferentiability framework. Therefore, we concentrate on proposing RO that 
can support many applications. 

First we propose Traceable Random Oracle TRO as RO. 


Traceable Random Oracle. Our proposal of TRO is motivated by the following 
points: 


— Applications of TRO hide the outputs of hash functions from adversaries. 
One example is OAEP encryption: Adversaries cannot know the outputs of 
the hash functions that are used for calculating a cipher text, since these 
values are hidden by a random value or a trapdoor one-way permutation. 

— TRO leaks useful information such that S can run the extension attack. 


By considering the above points, it is convenient for S to obtain useful informa- 
tion from value z which is the output of RO(M). Thus we define TRO that leaks 
input M on query z such that RO(M) = z. Since S can obtain value M such that 
z = RO(M), S can know value z’ = RO(M||m) by using TRO. Therefore, S can 
run the extension attack. We will prove that MD" C TRO holds (Corollary J). 
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Since the hash function outputs for OAEP and variants of OAEP (e.g. OAEP+) 
are hidden, adversaries cannot use TRO effectively. So we can easily confirm that 
these cryptosystems are secure in the TRO model. 


Limitation of TRO. Though TRO can easily confirm the security of many cryp- 
tosystems under MD", there are several cryptosystems whose security we can- 
not confirm by TRO. For example, RSA-KEM is insecure in the TRO model 
(Theorem [{). It is possible that there are cryptosystems that are secure under 
MD” because TRO leaks information beyond that needed to simulate the exten- 
sion attack. The essential information to simulate the extension attack is just 
z' = RO(M||m), but TRO leaks M, which is not essential. 

Our response is to propose Extension Attack Simulatable Random Oracle ERO 
as RO. 


Extension Attack Simulatable Random Oracle. We define ERO that leaks just 2’ 
(= RO(M]|m)). By using ERO, S can run the extension attack, since S can know 
z', We will prove that MD” C ERO holds (Theorem J). We will also prove that 
RSA-KEM is secure in the ERO model (Theorem Bh. Therefore, we can confirm 
RSA-KEM security under MD” by using ERO. Fortunately, MD” is equivalent to 
ERO, since ERO C MD” holds (Theorem Għ. Namely, any cryptosystem that is 
secure under MD” is equally secure in the ERO model and vice versa. Therefore, 
ERO is necessary and sufficient to confirm the security of cryptosystems under 
MD”. When we analyze a cryptosystem under M D”, all that is needed is to prove 
cryptosystems security in the ERO model. 


TRO v.s. ERO. Since TRO leaks more information than ERO, we will prove 
ERO c TRO. Since ERO has wider applicability, we recommend that ERO be 
used for cryptosystems whose security cannot be proven in the TRO model. 


ERO v.s. RO. Since ERO leaks several bits of information in permitting the 
simulation of the extension attack, RO C ERO and ERO ¢ RO explicitly hold. 
As evidence of the separation between RO and ERO, we pick up prefix MAC 
which is secure in the RO model, and prove that prefix MAC is insecure in the 
ERO model (Theorem). Since ERO is equivalent to MD”, prefix MAC is also 
insecure in the MD” model. 


Leakey Random Oracle. Leaky random oracle LRO was proposed by Yoneyama 
et al. but with a different motivation. LRO has a function that leaks all 
query-response pairs of RO. In this paper, we will prove that TRO c LRO and 
LRO ¢ TRO hold. Therefore, all cryptosystems secure in the LRO model are also 
secure in the TRO model and there is some cryptosystem that is insecure in the 
LRO model but secure in the TRO model. Since FDH is secure in LRO model 
[13], FDH is secure under MD”. Since OAEP is insecure in the LRO model [I3] 
and secure in the TRO model, OAEP is evidence of the separation between LRO 
and TRO. 


Remarks. First we compare LRO, TRO and ERO from the viewpoint of security 
proofs of cryptosystems. LRO, TRO, and ERO consist of RO and the additional 
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oracle (denote LO, TO and EO respectively). Since LO leaks more information to 
adversaries than TO, adversaries that are given LRO have more flexible strategies 
than adversaries given TRO. That is, security proofs in the LRO model are more 
complex than those in the TRO model. The same is true for TRO and ERO. 

Finally, for the security proof of cryptosystem C(MD") we compare the direct 
proof in MD” with the proof via ERO. Since MD” has the MD structure, we 
must consider this structure in the direct proof. On the other hand, since ERO 
does not have this structure, we does not need to consider it. For example we 
must consider the events of inner collisions for MD” in the direct proof. However 
this is not necessary for the proof in the ERO model. Moreover, since we can 
reuse existing proofs for the simulation of RO in the security proof in the ERO 
model, we only consider the simulation of EO in the security proof. Therefore, 
the security proof in the ERO model is easier than the direct proof in MD”. 
Since ERO = MD" holds, we can confirm a cryptosystems security under MD” 
by proving its security in ERO, an easier task than a direct proof. 


Related Works. Recently, Dodis et al. independently proposed a methodology 
to salvage the original and modified MD constructions in many applications [7]. 
They found two properties: one is preimage awareness (PrA), and the other is 
public-use random oracle (pub-RO). pub-RO is the same as LRO. The approach 
of pub-RO is almost same as our approach of LRO. Dodis et al. pointed out that 
the security of cryptosystems that satisfy the following property can be easily 
proven in the pub-RO model: all inputs of hash functions are public to the ad- 
versaries. Therefore, PSS and the Fiat-Shamir signature scheme, and other, are 
easily proven to be secure in the pub-RO model by using existing proofs in the 
RO model. Since LRO(pub-RO) % TRO and TRO c LRO(pub-RO) hold, TRO 
and ERO have more applications than LRO(pub-RO) (e.g. OAEP is secure in 
the TRO model but insecure in the pub-RO model). The approach of PrA is 
interesting in that this approach can treat the case where the compression func- 
tion f requirement is relaxed from FILRO to property PrA. It seems, however, 
that this approach is not effective in saving the original MD construction, since 
this approach modifies MD construction by processing the output of the MD 
construction by FILRO. 


Cryptosystems Security under the Merkle-Damgard Hash Function. 
PSS, Fiat-Shamir, and so on are secure under MD” thanks to pub-RO [K], OAEP 
and variants of OAEP are secure under MD” thanks to TRO, and RSA-KEM is 
secure under MD” thanks to ERO. Since many cryptosystems are secure under 
MD", the original Merkle-Damgard construction is still alive! 


2 Preliminaries 


2.1 Merkle-Damgard Construction 


We first give a short description of the Merkle-Damgard (MD) construction. 
Function MD! : {0,1}* — {0,1}” is built by iterating compression function 
f : {0,1}” x {0,1}' > {0,1}” as follows. 
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— MD/(M): 
1. calculate M’ = pad(M) where pad is a padding function such that pad : 
{0,1}* > ({0, 1}*)*. 
2. calculate c; = f(cGj—1,m;) for i = 1,...,l where for i = 1,...,1, |mi| = t, 
M’ = mı||...||m; and co is an initial value (s.t. |co| = n). 
3. return Cp 
In this paper we ignore the above padding function, this does not degrade gener- 
ality, so hereafter we discuss MD! : ({0,1}')* > {0,1}”. We use random oracle 
compression function h as f where h : {0,1}" x {0,1} — {0,1}”. Thus we 
discuss below hash function MD” with MD construction using h. 


2.2 Random Oracle 


RO : {0,1}* — {0,1}” can be realized as follows. RO has initially the empty 
hash list Lro. On query M, if 4(M,z) € Lro, it returns z. Otherwise, it chooses 
z € {0,1}” at random, adds (M,z) to the Lro, hereafter denoted by Lro — 
(M, z), and returns z. 


2.3 Leaky Random Oracle 


LRO was proposed by Yoneyama et al. [I3]. LRO can be realized as follows. LRO 
consists of RO and LO. On a leak query to LO, LO outputs the entire contents 
of Lro. We can define S that can simulate the extension attack by using LRO, 


since S can know M from z by using LO and can know 2z’ by posing M||m to 
RO. 


2.4 Indifferentiability 


The indifferentiability framework generalizes the fundamental concept of the 
indistinguishability of two cryptosystems C (U) and C(V) where C(U) is the cryp- 
tosystem C that invokes the underlying primitive U and C(V) is the cryptosystem 
C that invokes the underlying primitive V. U and V have two interfaces: pub- 
lic and private interfaces. Adversaries can only access the public interfaces and 
honest parties (e.g. the cryptosystem C) can access only the private interface. 
We denote the private interface of the system W by WP and the public 
interface of the system W by WP". The definition of indifferentiability is as 
follows. 
Definition 1. V is indifferentiable from U, denote V C U, if for any distin- 
guisher D with binary output (0 or 1) there is a polynomial time simulator S 
such that |Pr[pY™™ y => 1]— Pr|pu™ su) => 1]| < e. Simulator S has oracle 
access to UP" and runs in time at most ts. Distinguisher D runs in time at most 
tp and makes at most q queries. € is negligible in security parameter k. 


This definition will allow us to use construction V instead of U in any cryptosys- 
tem C and retain the same level of provable security due to the indifferentiability 
theory of Maurer et al. D]. We denote “C(V) is at least as secure as C(U)” by 
C(V) > C(U). Namely, C(V) > C(U) denotes the case that if C(/) is secure, then 
C(V) is secure. More strictly, V C U = C(V) > C(U) holds. 
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2.5 Extension Attack 


Coron et al. showed that MD” is not indifferentiable from RO due to the extension 
attack. The extension attack targets MD" where we can calculate a new hash 
value from some hash value. Namely z’ = MD"(M||m) can be calculated from 
only z and m by z’ = h(z,m) where z = MD* (M). Note that z’ can be calculated 
without using M. The differentiable attack with extension attack is as follows. 
Let Oa be MD” or RO and let Oy be h or S. First, a distinguisher poses M to Oa 
and gets z from Oa. Second, he poses (z, m) to Oy and gets c from Op. Finally, 
he poses M||m to Oa and gets z’ from Oa. 

TO) 2 = MD” and Oy, = h, then z’ = c, while, if Oa = RO and ©» = 
S, then z’ Æ c. This is because no simulator can obtain the output value of 
RO(M||m) from just (z,m) and the output value of RO(M||m) is independently 
and randomly defined from c. Therefore, MD” Z RO holds. 


3 Variants of Random Oracles 


In this section, we will introduce several variants of random oracles in order for S 
to simulate the extension attack described above, and then show the relationships 
among these oracles within the indifferentiability framework. 


3.1 Definition of Variants of Random Oracles 
Traceable Random Oracle: TRO consists of RO and TO. On trace query z, 


1. If there exist pairs such that (Mj, z) € Lro (i = 1,...,n), it returns (Mj,..., 
Mn). 
2. Otherwise, it returns L. 


We can define S that can simulate the extension attack by using TRO, since S 
can know M from z by using TO and can know 2’ by posing M||m to RO. 


Extension Attack Simulatable Random Oracle: TRO leaks too much in- 
formation to simulate the extension attack. So we define ERO such that S is given 
just the important information. The important information is value z’ such that 
z' = RO(M||m). Therefore, we define ERO as follows. ERO consists of RO and 
EO. EO has initially the empty list Lgo and can look into £ro. On simulation 
query (m, z) to EO where |m| = t, 


1. If (m,z, 2’) € Leo, it returns z’. 

2. Else if z = IV, EO poses query m to RO, receives z’, Leo — (m, z, 2’), and 
returns 2’. 

3. Else if there exists only one pair (M, z) E€ Lro, EO poses query M||m to RO, 
receives 2’, Leo — (m, z, 2’), and returns 2’. 

4. Else EO chooses z’ € {0,1}”" at random, Leo + (m, z, 2’) and returns 2’. 


We can construct S that can simulate the extension attack by using ERO, since 
S can obtain z’ from (m, z) where z’ = RO(M||m) by using EO. 
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3.2 Relationships among LRO, TRO, ERO, and RO Models within 
the Indifferentiability Framework 


LRO leaks more information of Lro than TRO, and TRO leaks more information 
of Lro than ERO. Therefore, it seems reasonable to suppose that anything secure 
in the LRO model is also secure in the TRO model, anything secure in the TRO 
model is also secure in the ERO model, and any cryptosystem secure in the ERO 
model is also secure in the RO model. We prove the validity of these suppositions 
by using the indifferentiability framework. 

First we clarify the relationship between TRO and LRO. 


Theorem 1. TRO C LRO and LRO ¢ TRO. 


Proof. We construct S which simulates TO by using LRO as follows. Given query 
z, S poses a leak query to LO and receives the entire information of Lro. If 
there exists pairs such that (Mj, z) € Cro (i =1,...,n), it returns (Mi, ..., Mn). 
Otherwise it returns L. 

It is easy to see that |Pr|DROTO > 1] — Pr[DRO5(LRO) > 1]| = 0, since the 
output from each step of S is equal to that from each step of TO. 

LRO ¢ TRO is trivial, since no S cannot acquire all values in Leo by using 
TRO. 


Since TRO C LRO, any cryptosystem secure in the LRO model is also secure in 
the TRO model by the indifferentiability framework. Since LRO £ TRO, there 
exists some cryptosystem that is secure in the TRO model but insecure in the 
LRO model. For example, Yoneyama et al. proved that OAEP is insecure in the 
LRO model [I3]. Since OAEP is secure in the TRO model, OAEP is evidence of 
the separation between LRO and TRO. 

Next we will clarify the relationship between ERO and TRO. 


Theorem 2. ERO C TRO and TRO ¢ ERO. 


Proof. We construct S which simulates EO by using TRO as follows. S initially 
has the empty list Ls. On query (m, z), if 3(m, z, 2’) € Ls, it returns z’. Other- 
wise S poses query z to TO, and receives string X. If X consists of one value, it 
poses query X||m to RO, receives z’, Ls — (m, z, 2’) and returns z’. Otherwise, 
it chooses z’ € {0,1}” at random, Ls + (m, z, 2’) and returns 2’. 

It is easy to see that |Pr[DROF° = 1] — Pr|DROSCRO) > 1]| = 0, since the 
output from each step of S is equal to that from each step of EO. 

TRO ¢ ERO is trivial, since no S cannot decide whether there exists (M, z) 
in Lro or not by using ERO. 


Since ERO C TRO, any cryptosystem secure in the TRO model is also secure in 
the ERO model in the indifferentiability framework. Since TRO Z ERO, there 
exists some cryptosystem that is secure in the ERO model but insecure in the 
TRO model. We will prove that RSA-KEM is secure in the ERO model but 
insecure in the TRO model in Section] Therefore, RSA-KEM is evidence of the 
separation between TRO and ERO. 

Finally we will clarify the relationship between RO and ERO. 
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Theorem 3. RO CERO and ERO ¢ RO. 


This proof of theorem BJ is trivial because ERO consists of RO and the addi- 
tional oracle EO which leaks some information of Lro. Since RO C ERO, any 
cryptosystem secure in the ERO model is also secure in the RO model by the 
indifferentiability framework. Since ERO ¢ RO, there exists some cryptosystem 
which is secure in the RO model but insecure in the ERO model. We can show 
simple evidence of the separation between ERO and RO as follows: We consider 
the following Prefix-MAC protocol which is unforgeable in the RO model. Note 
that the concept of unforgeability with regard to MAC schemes is defined in [I]. 


Prefix MAC [12]: Alice and Bob share one secret key, K, as an authentication 
key. Before sending message M to Bob, Alice sends K’||M to RO H to obtain 
a MAC value denoted as y. Finally, Alice sends (M, y) to Bob. When Bob 
obtains (M, y), he sends K||M to H to obtain another MAC value y’. If y’ is 
equal to y, then Bob is convinced that message M is from Alice. Otherwise, 
Bob will reject message M. 


We will show that Prefix MAC fails to satisfy unforgeability for MAC schemes 
in the ERO model. 


Theorem 4 (Insecurity of Prefix MAC in the ERO model). Prefix MAC 
does not satisfy unforgeability for MAC schemes where H is modeled as ERO. 


Proof. A forgery procedure is as follows: forger F obtains a valid pair of (M, h) 
from MAC, where h = H(K||M). F sends (h,m) to EO, and obtains h’ = 
H(K||M||m). Since M||m is not queried to MAC, F succeeds in Existential 
forgery of known message attack (EF-KMA) attack using ERO H. 


Therefore, Prefix-MAC is secure in the RO model but insecure in the ERO model. 
Consequently, Prefix-MAC is evidence of the separation between ERO and RO. 
From the above discussions, the following corollary is obtained. 


Corollary 1. ROC ERO c TRO c LRO, and LRO ¢ TRO ¢ ERO ý RO. 


4 Relationship between MD” and ERO in the 
Indifferentiability Framework 


In this section we prove that MD” c ERO and ERO c MD” hold as follows. In 
theorem D] we use statements oy and qn instead of the total number of queries 
q. On is the total number of message blocks for RO/ MD” and dn is the total 
number of queries to S/h 

Theorem 5. MD” C ERO, for any tp, with ts = O(q?) and e < 
Aon tan) tolon tan) , 


This proof is given in subsection ÆI] 

In theorem [E] we use statements oy and geo instead of the total number of 
queries q. oy is the total number of message blocks for RO/MD” and geo is the 
total number of queries to EO/S 
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Theorem 6. ERO c MD", for any tp, with ts = O(qeo) and € < 
2(oH +qe0)? +(0H +qeo) 
gn f 


This proof is given in subsection LJ] 

From Theorem Bland Theorem El ERO is equivalent to MD” in the indifferen- 
tiability framework. From Corollary] TheoremBJand Theorem f] the following 
corollary is obtained. 


Corollary 2. RO C MD” = ERO c TRO c LRO, and LRO ¢ TRO ¢ ERO = 
MD” ¢ RO 


4.1 Proof of Theorem B] 


First we define simulator S as follows. S has a list 7 which is initially empty. We 
define chain triples as follows. 


Definition 2 (Chain Triples). Triples (a1,™m1,y1),...,(@i,™mi,yi) are chain 
triples if xı = IV and yj = x41 (j = 1, ..., j — 1) holds. 


Simulator S: On a query (x, m), 


1. If 3(z, m,y) € T, it outputs y. 

2. Else if chain triples 3(x1, M1, y1),..., (Li, Mi, yi) E T such that z = yi, y — 
RO(mıll|...|[millm). 

3. Else, y — EO(m, x). 

4. T — (x,m,y). 

5. S returns y. 


Since S needs to search pairs in 7, this requires at most O(q?) time. 

We need to prove that S cannot tell apart two scenarios, ERO and MD". In one 
scenario D has oracle access to RO and S while in the other D has access to MD” 
and h. The proof involves a hybrid argument starting in the ERO scenario, and 
ending in the MD” scenario through a sequence of mutually indistinguishable 
hybrid games. 

We give six events that allow D to distinguish MD” from ERO. These events 
arise from the fact that MD” has the MD construction but ERO does not. We 
explain these events as follows. Details of these events are given in Game 3. 

First we discuss distinguishing events that occur due to differences among RO 
and MD". RO and MD” return a random value unless collision occurs. There- 
fore, distinguishing events occur when collision occurs. When a collision of MD” 
occurs, one of following events occurs due to the MD construction: an output of 
h is equal to IV (event E1) or a collision of h occurs (event E2). On the other 
hand, since RO is a monolithic function, these events don’t occur. Therefore, 
these events are distinguishing events between MD" and ERO. 

Second, we discuss distinguishing events that occur due to differences among 
S and h. Since for h there is the relation that h(x,m) = RO(M||m) where 
MD"(M) = x, S must simulate the relation such that S(x,m) = RO(M||m) 
where RO(M) = x. On query (a,m) to S, if only one pair exists (M, x) € Lro 
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such that « # IV holds, S can know MD"(M||m) by using EO. Therefore, S 
can simulate the relation. If such a pair does not exist ((M, x) Lro), since S 
cannot know M, S cannot know the value of RO(M||m). Therefore, S cannot 
simulate the relation (event E3 and event E5). If two or more such pairs exist 
((M, x), (M’',x),--- € Lro), S must simulate the relation such that RO(M||m) = 
RO(M’'||m) =,---. However, since S cannot control the outputs of RO, it cannot 
simulate the relation (event E4). 

On the other hand, if I(M, x) € Lro such that x = IV, S must simulate the 
relation such that RO(m) = RO(M||m). However, since S cannot control the 
outputs of RO, it cannot simulate the relation (event E6). 

In following game transforms, since the MD construction is considered in 
Game 3 for the first time, we discuss these events in the transform from Game 2 
to Game 3. In this discussion, we show that if distinguishing events don’t occur, 
Game 3 is identical to Game 2, and the probability that one of the events will 
occur is negligible. 


Game 1: This is the random oracle model, where D has oracle access to RO 
and S. Let G1 denote the event that D outputs 1 after interacting with RO and 
S. Thus Pr[G1] = Pr[DRO-S(ERO) — 4), 


Game 2: In this game, we give the distinguisher oracle access to a dummy relay 
algorithm Ro instead of direct oracle access to RO. Ro is given oracle access to 
RO. On query M to Ro, it queries M to RO and returns RO(M). Let G2 denote 
the event that D outputs 1 in Game 2. Since the view of D remains unchanged 
in this game, Pr[G2] = Pr[Gl]. 


Game 3: In this game, we modify the relay algorithm Ro into Rı as follows. 
For hash oracle query M, Rı applies the MD construction to M by querying S. 
R, is essentially the same as MD” except that Rı is based on S instead of the 
fixed input length random oracle h. 

We show that Game 3 is identical with Game 2 unless the following bad events 
occur. In response to query (x, m), S chooses response y € {0,1}”: 


— E1: It is the case that y = IV. 

— E2: There is a triple (x’, m,y’) € T, with (x', m’) 4 (x, m), such that y/ = y. 

— E3: There is a triple (x', m’, y’) € T, with (x', m) # (x,m), such that z’ = y 
and (x',m',y') is defined exept for step 3 of EO. 


and in a response to a query M to RO, RO returns z: 


— E4: There is a pair (M’,z’) € Lro, with M # M’ such that z = z’. 
— E5: There is a triple (a’,m’,y’) E€ T such that z = a’. 
= E6: z= IV: 


We demonstrate that Game 3 is identical with Game 2 unless bad events occur 
and the probability that bad events occur is negligible. Before we demonstrate 
these facts, we give an useful property as follows. 
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Lemma 1. For any chain triples (x1, M1, Y1), (£i; Mmi yi) in T, yi = 
RO(m1ıl|...||m:) holds unless bad events occur. 


Proof. To contrary, assume that y; 4 RO(mı||...||m;). Since y; is defined in step 
2 of S (case A), step 2 of EO (case B), step 3 of EO (case C), or step 4 of EO 
(case D), we show that when y; is defined in each step, bad events occur. 

First, we discuss the case A. In this case, we divided two case: When (2;, Mi, Yi) 
is stored, another chain triples (x4, M4, Y1), -.-, (£1, mt, y;) are already stored in 
T such that y = yi—ı (case A-1) and chain triples are not stored in T (case 
A-2). The case A-1 is equal to collision of MD°. Therefore a collision of S occurs 
or an output of S is equal to IV in this case. Therefore event E1 or E2 occurs. 
In the case A-2, since y; = RO(m4|]...||7m;) holds from the definition of S, this is 
contrary to the assumption. 

We discuss the case B. In this case, we divided two cases: į = 1 (case B-1) and 
i #1 (case B-2). In the case B-1, yı = RO(m ) holds due to the definition of S. 
This is contrary to the assumption. In the case B-2, since z; = IV, yji-1 = IV 
holds. Therefore event E1 or E6 occurs. 

We discuss the case C. In this case, (M,x;) is already in Lao, when yi 
is defined. We consider two cases: M = mzj||...|/m:-1 (case C-1) and M 4 
my||...||m:-1 (case C-2). In the case C-1, yi = RO(m1||...||m;) holds and this 
is contrary to the assumption. In the case C-2, we consider two case: y;-1 is 
chosen at random by EO (case C-2-1) and y;—1 is defined by RO (case C-2-2). 
For the case C-2-1, from the definition of S, when (£i—1, m_1;, Yi—1) is stored in 
T, some triple (xj, mj, yj) is not in T. Assume that j is the maximum number. 
Therefore yj+1,..., yi-1 are defined at random by EO and independent from RO. 
(£j+1;, Mj+1, Yj+1) is stored in T before (xj, mj, yj) is stored in T. If y; is defined 
at random by EO and independent from RO, event E3 occurs. If y; is defined by 
RO (yi = RO(mı||...||m;)), event E5 occurs. The case C-2-2 is equal to event E4. 

Finally we discuss the case D. From the same discussion of the case C-2-1, 
bad event E3 or E5 occurs. 


For the view of D for Ro and Rj, from Lemma [I] for any M, Rı(M) = RO(M) 
holds unless bad events occur. Therefore the view of D for Ro is equal to that 
for R,. For consistency in Game 2, from the definition of S and Lemma [J 
for any chain triples (x1, M1, y1),-.-,(@i,™:,yi) E€ T, yi = RO(mil|...|] ms) = 
Ro(mj||...|/m;) holds unless bad events occur. Therefore, the answers given by 
S are consistent with those given by Ro. For consistency in Game 3, from 
the definition of S, the definition of Ry and Lemma [J for any chain triples 
(x1, Mı, Y1), sony (£i, Mis Yi) € T, Yi = Ri(m4||...||mz) = RO(m4||..-||7) holds 
unless bad events occur. Therefore, the answers given by S are consistent with 
those given by R,. Therefore, Game 3 is identical with Game 2 unless bad events 
occur. 
Next we examine the probability that bad events occur as follows. 


2 2 
Lemma 2. Pr[E1VE2V E3 v E4 V E5 v E6] < “atetaetare where q is the 
maximum number of invoking the simulator and q2 is the maximum number of 
invoking RO. 
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Proof. We will examine each of the three events and bound their probability. 


Since outputs of S are chosen at random, Pr[E1] < #. Since E2 is the event 


n n 2 
where a collision occurs, Pr[E2] < 1 — Sa tae a < 4. Since y is chosen 
2 
at random, the probability that event E3 < a. Since E4 is the event that a 


RO collision occurs, Pr[E4] < B, Since E5 is the event that a random value is 
equal to some fixed value, Pr[E5] < 4%. Since E6 is the event that a random 
value is equal to IV, Pr[E6] < #. Therefore Pr[E1 V E2 V E3 V E4 V E5 v E6] < 


Pr[E1] + Pr[E2] + Pr[E3] + Pr[E4] + Pr[E5] + Pr[E6] < ité tugetata, 


Let G3 denote the event that the distinguisher D outputs 1 in Game 3, B2 be the 
event wherein E1 V E2 V E3 V E4 V E5 V E6 occurs in Game 2 and B3 be the event 


wherein El V E2 V E3 V E4 V E5 V E6 occurs in Game 3. From Lemma B] the prob- 
o7, +347 +3qjo H +2qnh+0H ahd 


ability that bad events occur in Game 2 is less than 


the probability that bad events occur in Game 3 is less than Alortan) t2lorta), 


Therefore |Pr[G3]— Pr[G2]| = |Pr[G3AB3]+ Pr[G3A-B3]— Pr[G2AB2]—Pr[G2A 
=B2]| < |Pr[G3|B3] x Pr{[B3] — Pr[G2|B2] x Pr[B2]| < max{Pr[B2], Pr[B3]} = 
A(on+qn)?+2(oH +4n) 

z ; 


Game 4: In this Game, we modify simulator S to Sı. RO is removed from 
simulator Sı as follows. 


Simulator Sı: On query (x, m), 


1. If 3(x, m,y) € T, it responds with y. 

2. Else Sı chooses y ~— {0,1}” at random. 
3. T — (a,m,y). 

4. Sı responds with y. 


The output of S is chosen at random or chosen by RO. Therefore, for any fresh 
query to S, the response is chosen at random. Since RO is invoked only by S, no 
D can access RO. Namely, no D distinguish S4 from S, though RO is removed in 
Si, so Game 4 is identical to Game 3. Let G4 denote the event that distinguisher 
D outputs 1 in Game 4. Pr[G4] = Pr[G3] holds. 


Game 5. This is the final game of our argument. Here we finally replace 
Sı with the fixed input length random oracle h. Let G5 denote the event that 
distinguisher D outputs 1 in Game 5. Since for a new query Sı responds with a 
random value and for a repeated query Sı responds a repeated value, Game 5 is 
identical to Game 4. Therefore, we can deduce that Pr[G5] = Pr[G4]. 

Now we can complete the proof of Theorem by combining Games 1 to 5, and 
observing that Game 1 is the same as ERO scenario while Game 5 is same as 


2 
MD” scenario. Hence we can deduce that € < Aon tan) tlont), 


4.2 Proof of Theorem Ø] 


We define simulator S that simulates EO. S has initially empty list Ls. On query 
(m, z), S is defined as follows: z’ — h(z,m), and it returns z’. The simulator’s 
running time requires at most O(qeo) time. 
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We need to prove that S cannot tell apart two scenarios, MD” and ERO 
scenarios, one where D has oracle access to MD” and S and the other where D 
has access to RO and EO. The proof involves a hybrid argument starting in the 
MD” scenario, and ending in the ERO scenario through a sequence of mutually 
indistinguishable hybrid games. 


Game 1: This is the MD” scenario, where D has oracle access to MD” and 
S(h). Let G1 denote the event that D outputs 1 after interacting with MD” and 
S(h). Thus Pr[Gl] = Pr| DMP" S™ = 1). 


Game 2: In this game, we change the underlying primitive of MD from h 
to S. Thus D interacts with MD° and S(h). For any query to S, S poses it 
to h and returns the value received from h. Let G2 denote the event that D 


outputs 1 in Game 2. Since the view of D remains unchanged in this game, so 
Pr[G2] = Pr[Gl]. 


Game 3: In this game, we remove S and h and insert EO and RO. In this 
game, D interacts with MDE? and EO and does not access to RO. Since for a 
fresh query EO returns a fresh random value and for a repeated query EO returns 
the corresponding value, Game 3 is identical with Game 2. Let G3 denote the 
event that D outputs 1 in Game 3. Since the view of D remains unchanged in 
this game, so Pr[G3] = Pr[G2]. 


Game 4. This is the final game of our argument. In this game, we remove 
MDE? and D interacts with RO and EO. We show that Game 4 is identical with 
Game 3 unless following bad events occur and probability that bad events occur 
is negligible. 

Bad events are as follows. On query (m, x), EO returns y: 


— Badl: y = IV. 
On query M, RO returns z: 


— Bad2: There is a pair (M’, 2’) in Leo, with M # M’, such that z = z’. 
— Bad3: There is a triple (m, x, y) in Leo such that z = x. 


We demonstrate that Game 4 is identical with Game 3 unless bad events occur 
and the probability that bad events occur is negligible. Before we demonstrate 
these facts, we give an useful property as follows. 


Lemma 3. For any chain triples (£1, M1, Y1), (Zi, Mi, yi) in Leo, Yi = 
RO(m1ıl|...||m;:) holds unless bad events occur. 


Due to lack of space, we omit this proof. We will show this in the full version. 
For the view of D for MDE? and RO, from Lemma B] the view of D for 
MDF° is equal to that for RO. For consistency in Game 3, from the definition of 
MD and Lemma B] for any chain triples (m1, £1, y1),---, (mi, £i, Yi) E€ Leo, Yi = 
RO(m||...||7m;) = MD®°(m,||...||r;) holds unless bad events occur. Therefore, 
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the answers given by S are consistent with those given by M DEO, For consistency 
in Game 4, from Lemma B] for any chain triples (x1, m1, y1),-.-;(@i, Mi, yi) € 
Leo, yi = RO(my||...||™;) holds unless bad events occur. Therefore, the answers 
given by S are consistent with those given by RO. Therefore, Game 4 is identical 
with Game 3 unless bad events occur. 

Next we examine the probability that bad events occur as follows. 


Lemma 4. Pr[Bad1 V Bad2 V Bad3] < atatao where qı is the maximum 
number of invoking EO and q> is the maximum number of invoking RO. 


Due to lack of space we omit this proof. 

Let G4 denote the event that the distinguisher D outputs 1 in Game 4, B3 
be the event that Bad1 V Bad2 V Bad3 occurs in Game 3 and B4 be the event 
that Bad1 V Bad2 V Bad3 occurs in Game 4. Therefore |Pr[G4] — Pr[G3]| < 


max{ Pr[B3], Pr[B4]} = Bon tgo) (on tao), 


4.3 MGF1 Transform 


In the above discussions, we ignored range extension algorithms such as MGF1 
which is an instantiated hash function of OAEP. When we consider these algo- 
rithms, we need to modify TRO and ERO. Due to the lack of space, we only 
modify TRO for MGF1 as follows and will discuss ERO in the full paper. 

Let H : {0,1}* — {0,1}" be some hash function and MGF1 : {0,1}* - 
{0,1}9" be H(M||{1])||H(1||[2})||...|| 7M] |[7]) where M is the input of the 
hash function and [s] is the encoding value of s. We confirm the security of 
cryptosystems that use MGF1 transform with MD’ by the following approach. 
Let MGF1: {0,1}* — {0,1}. 


— Propose the modification of TRO (denote TRO’ that consists of random 
oracle RO’ : {0,1}* — {0,1}" and TO of RO’) such that MGF1(TRO) c 
TRO’. 

— Prove cryptosystems security in TRO’ model. 


If we can find above TRO’, since MD” c TRO, cryptosystems that are secure in 
TRO’ model are secure under MD”. 

TRO’ is as follows. TRO’ consists of random oracle RO’ : {0,1}* — {0,1})" 
and TO’, a variant of TO. Let z[s] be the s-th block of z. On trace query (j, w) 
to TO’, 


— If there exist pairs such that (M,z) € Lro such that z[j] = w, TO’ returns 
all such pairs. 
— Otherwise, TO’ returns L. 


When H is a random oracle, we can see H(x||{1]),...,1(*||[]) as independent 
random oracles ROj,...,RO,;. In order to prove MGF1(TRO) c TRO’, we need 
to find a simulator that simulates each TO of RO;,...,RO;. The simulator of TO 
of RO, can be easily shown by using queries (s,*) to TO’. Therefore, we can 
prove MGF1(TRO) c TRO’. 
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Cryptosystems that are secure in the TRO model are also secure in the TRO’ 
model by discussions similar to those for the cases of TRO. Note that security 
bound of these cryptosystems is dependent on n, not jn. 

The same discussion can be applied to KDF3 which is an instantiated hash 
function of RSA-KEM[I]]. 


5 Security Analysis of RSA-KEM in TRO and ERO 
Models 


The RSA-based key encapsulation mechanism (RSA-KEM) scheme [I] is a se- 
cure KEM scheme in the RO model. In this section, we consider the security of 
RSA-KEM in the TRO and ERO models. 

The notation of the scheme follows that in [II]. The security of RSA-KEM in 
the RO model is proved as follows; 


Lemma 5 (Security of RSA-KEM in the RO model [[J]). If the RSA 
problem is hard, then RSA-KEM satisfies IND-CCA for KEM where K DF is 
modeled as RO. 


5.1 Insecurity of RSA-KEM in TRO Model 


Though RSA-KEM is secure in the RO model, it is insecure in the TRO model. 
More specifically, we can show that RSA-KEM does not even satisfy IND-CPA 
for KEM in the TRO model. Note that IND-CPA means IND-CCA without DO. 


Theorem 7 (Insecurity of RSA-KEM in the TRO model). Even if the 
RSA problem is hard, RSA-KEM does not satisfy IND-CPA for KEM where 
K DF is modeled as TRO. 


Proof. We construct an adversary, A, which successfully plays the IND-CPA by 
using TRO KDF. The construction of A is as follows; 


Input : (n,e) as the public key 

Output : b’ as the guessed bit 

Step 1: Return state and receive (Kř,Cğ) as the challenge. Pose the trace 
query Ký to KDF, and obtain {r}. 


Step 2: For all r in {r}, check whether r° 2 Cò (mod n). If there is r* that 
satisfies the relation, output b’ = 0. Otherwise, output b = 1. 


We estimate the success probability of A. When challenge ciphertext Co is gen- 
erated, r* such that Kj = KDF(r*) is certainly posed to KDF because Oğ is 
generated following the protocol description. Thus, Lg pr contains (r*, C5, Ko). 
If (r*, Co, Kğ) is not in Lepr, then b = 1. Therefore, A can successfully play 
the IND-CPA game. 


5.2 Security of RSA-KEM in ERO Model 


We can also prove the security of RSA-KEM in the ERO model as well as in the 
RO model. 
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Theorem 8 (Security of RSA-KEM in the ERO model). If the RSA prob- 
lem is (t, €')-hard, then RSA-KEM satisfies (t, €)-IND-CCA for KEM as follows: 
t’=t+(arkor + deKxpr): expo, e > e— 12, where KDF is modeled as ERO, 
qrKprF is the number of hash queries posed to the RO of KDF, qexpr is the 
number of extension attack queries posed to the EO of KDF, qp is the number 
of queries posed to the decryption oracle DO and expo is the running time of 
exponentiation modulo n. 


The proof will be described in the full paper. 


Acknowledgements. We would like to thank the anonymous referees for their 
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On the Analysis of Cryptographic Assumptions 
in the Generic Ring Model* 


Tibor Jager and Jorg Schwenk 
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Abstract. At Eurocrypt 2009 Aggarwal and Maurer proved that break- 
ing RSA is equivalent to factoring in the generic ring model. This model 
captures algorithms that may exploit the full algebraic structure of the 
ring of integers modulo n, but no properties of the given representation of 
ring elements. This interesting result raises the question how to interpret 
proofs in the generic ring model. For instance, one may be tempted to 
deduce that a proof in the generic model gives some evidence that solving 
the considered problem is also hard in a general model of computation. 
But is this reasonable? 

We prove that computing the Jacobi symbol is equivalent to factoring 
in the generic ring model. Since there are simple and efficient non-generic 
algorithms computing the Jacobi symbol, we show that the generic model 
cannot give any evidence towards the hardness of a computational prob- 
lem. Despite this negative result, we also argue why proofs in the generic 
ring model are still interesting, and show that solving the quadratic 
residuosity and subgroup decision problems is generically equivalent to 
factoring. 


1 Introduction 


The security of asymmetric cryptographic systems relies on assumptions that 
certain computational problems, mostly from number theory and algebra, are 
intractable. Since proving useful lower complexity bounds in a general model of 
computation seems to be impossible with currently available techniques, these 
assumptions have been analyzed in restricted models, see BJLZBII], for instance. 
A natural and very general class of algorithms is considered in the generic ring 
model. This model captures all algorithms solving problems defined over an alge- 
braic ring without exploiting specific properties of a given representation of ring 
elements. Such algorithms work in a similar way for arbitrary representations of 
ring elements, thus are generic. 

Considering fundamental cryptographic problems in the generic model is mo- 
tivated by the following ideas. First, showing that a cryptographic assumption 


* This is an extended abstract, the full version is available on eprint [I3]. Supported 
by the European Community (FP7/2007-2013), grant ICT-2007-216646 - European 
Network of Excellence in Cryptology II (ECRYPT II). 
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holds with respect to a restricted but meaningful class of algorithms might indi- 
cate that the idea of basing the security of cryptosystems on this assumption is 
not totally flawed, and may therefore be seen as evidence that the assumption 
is also valid in a general model of computation. Second, showing that a large 
class of algorithms is not able to solve a computational problem efficiently is an 
important insight for the search for cryptanalytic algorithms, and can be used 
to deduce the optimality of certain classes of algorithms. Moreover, the generic 
model is a valuable tool to study the relationship among computational prob- 
lems, such as the equivalence of the discrete logarithm and the Diffie-Hellman 
problem, as done in [BESMIMGE], for instance. 

In this paper we prove a general theorem which states that solving certain 
subset membership problems in the ring Zn is equivalent to factoring n. This 
main theorem allows us to provide an example for a computational problem with 
high cryptographic relevance which is easy to solve in general, but equivalent to 
factoring in the generic model. Concretely, we show that computing the Jacobi 
symbol is equivalent to factoring in the generic ring model. 

For many common idealized models in cryptography it has been shown that 
a cryptographic reduction in the ideal model need not guarantee security in 
the “real world”. Well-known examples are, for instance, the random oracle 
model [9], the ideal cipher model [3], and the generic group model [2I]. All 
these results have in common that they used somewhat contrived constructions 
that deviate from standard cryptographic practice[] In contrast, our result on the 
generic equivalence of computing the Jacobi symbol and factoring is an example 
for a truly practical computational problem that is provably hard in the generic 
model, but easy to solve in general. This is an important aspect for interpreting 
results in the generic ring model, like PBID. Thus a proof in the generic 
model is unfortunately not even an indicator that the considered problem is 
indeed useful for cryptographic applications. 

This negative result does not affect the other mentioned motivations for the 
analysis of computational problems in the generic ring model. A lower bound 
in this model allows to deduce the optimality of certain classes of algorithms, 
and gives insight into the relationship between cryptographic problems, which is 
also of interest. Motivated by this fact, we also show that solving the quadratic 
residuosity and subgroup decision problems is generically equivalent to factoring. 
For the latter problem we show that the equivalence holds even in presence of a 
Diffie-Hellman oracle. Thus, a Diffie-Hellman oracle does not help in solving the 
subgroup decision problem. 

By taking a closer look at the construction of the simulator used in the proof 
of our main theorem, we furthermore deduce that for a certain class of compu- 
tational problems there exists an efficient generic ring algorithm if and only if 
there is an efficient straight line program solving the problem. 


An exception is the result of BO], showing a (non-generic) attack on a scheme with 
provable security in the generic model. However, [A] note that this stems not from 
a weakness in the generic model, but from an incorrect security proof. 
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1.1 Related Work 


Previous work considering fundamental cryptographic assumptions in the generic 
model considered primarily discrete logarithm-based problems and the RSA 
problem. Starting with Shoup’s seminal paper [22], it was proven that solv- 
ing the discrete logarithm problem, the Diffie-Hellman problem, and related 
problems is hard with respect to generic group algorithms. Damgard 
and Koprowski showed the generic intractability of root extraction in groups of 
hidden order [IQ]. 

Brown [B] reduced the problem of factoring integers to solving the low-exponent 
RSA problem with straight line programs, which are a subclass of generic ring 
algorithms. Leander and Rupp augmented this result to generic ring algo- 
rithms, where the considered algorithms may only perform the operations addi- 
tion, subtraction and multiplication modulo n, but not multiplicative inversion 
operations. Recently, Aggarwal and Maurer [I] extended this result from low- 
exponent RSA to full RSA and to generic ring algorithms that may also com- 
pute multiplicative inverses. Boneh and Venkatesan [Z] have shown that there is 
no straight line program reducing integer factorization to the low-exponent RSA 
problem, unless factoring integers is easy. 

The notion of generic ring algorithms has also been applied to study the 
relationship between the discrete logarithm and the Diffie-Hellman problem and 
the existence of ring-homomorphic encryption schemes [BEB]. 


2 Preliminaries 


2.1 Notation 


For a set A and a probability distribution D on A, we denote with a ? A the 
action of sampling an element a from A according to distribution D. We denote 
with U the uniform distribution. When sampling k elements ay1,..., ax bas A, we 
assume that all elements are chosen independently. 

Throughout the paper we let n be the product of at least two different primes, 
and denote with n = a p;' the prime factor decomposition of n such that 
gcd (p§, pj’) = 1 for iF j. 

Let P = (Sj,...,5m) be a finite sequence. Then |P| denotes the length of P, 
i.e. |P| = m. For k < m we denote with P;, the subsequence (S1,..., Sk) of P. 
For a sequences P with we write P, E P to denote that Py is a subsequence of 
P such that P, consists of the first |P;| elements of P. 


2.2 Uniform Closure 


By the Chinese Remainder Theorem, for n = a p,' the ring Zn is isomorphic 
to the direct product of rings Zp X xX Zper- Let ġ be the isomorphism 
Z e Xo X Liye — Zn, and for C C Zn let C; := {y mod př | y € C} for 
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Definition 1 (Uniform Closure). We say that U|C] C Zn is the uniform 
closure of C C Zn, if 


U |C] = {y € Zn | y = Or --- Yk) Yi E Ci for 1< i< k}. 


In particular note that C C U |C], but not necessarily U [C] C C. The following 
lemma follows directly from the above definition. 


Lemma 1. Sampling y S U |C] uniformly random from U JC] is equivalent to 
sampling yi uniformly and independently from Ci for 1 < i < k and setting 


y= lyi,- Yk). 


2.3 Straight Line Programs 


A straight line program over a ring R is a generic ring algorithm performing a 
fixed sequence of ring operations, without branching, that outputs an element 
of R. Thus straight line programs are a subclass of generic ring algorithms. 
The following definition is a simple extension of [8] Definition 1] to straight line 
programs that may also compute multiplicative inverses. 


Definition 2 (Straight Line Programs). A straight line program P of length 
m over Zn is a sequence of tuples 


P= ((41, 91,01), °°* »(4msJms m)) 


where —1 < tx, jn < k and o; E€ {+,—-,-,/} fori € {1,...,m}. The output P(x) 
of straight line program P on input x E€ Zn is computed as follows. 


1. Initialize L—ı := 1 E€ Zn and Lọ := z. 

2. For k from 1 to m do: 
— ifo,=/ and Lj, ¢ Z}, then return L, 
— else set Ly := Li, 0 Lip- 

3. Return P(x) = Lm. 


We say that each triple (i,j,0) € P is a SLP-step. 


For notational convenience, for a given straight line program P we will denote 
with P, the straight line program given by the sequence of the first k elements of 
P, with the additional convention that P- (x) = 1 and Po(#) = = for all x € Zn. 


2.4 Generic Ring Algorithms 


Similar to straight line programs, generic ring algorithms perform a sequence 
of ring operations on the input values 1,z € Zn. However, while straight line 
programs perform the same fixed sequence on ring operations to any input value, 
generic ring algorithms can decide adaptively which ring operation is performed 
next. The decision is made either based on equality checks, or on coin tosses. 
Moreover, the output of generic ring algorithms is not restricted to ring elements. 


On the Analysis of Cryptographic Assumptions in the Generic Ring Model 403 


We formalize the notion of generic ring algorithms in terms of a game between 
an algorithm A and a black-box O, the generic ring oracle. The generic ring 
oracle receives as input a secret value x € Z,,. It maintains a sequence P, which 
is set to the empty sequence at the beginning of the game, and implements two 
internal subroutines test() and equal(). 


— The test()-procedure takes a tuple (j,0) € {-1,...,|P|} x {+,—-,-, /} as in- 
put. The procedure returns false if o = / and P;(x) Z}, and true otherwise. 

— The equal()-procedure takes a tuple (i,j) € {-1,...,|P|} x {-1l,...,|P|} 
as input. The procedure returns true if P;(x) = P;(x) mod n and false 
otherwise. 


In order to perform computations, the algorithm submits SLP-steps to O. 
Whenever the algorithm submits (i, j,o) with o € {+,-,-,/}, the oracle runs 
test(j,o). If test(j,0) = false, the oracle returns the error symbol L. Otherwise 
(i, 7,0) is appended to P. Moreover, the algorithm can query the oracle to check 
for equality of computed ring elements by submitting a query (i, 7,0) such that 
o € {=}. In this case the oracle returns equal(i, j). We measure the complexity 
of A by the number of oracle queries. 


2.5 Some Lemmas on Straight Line Programs over Zn 


In the following we will state a few lemmas on straight line programs over Zp, 
that will be useful for the proof of our main theorem. 


Lemma 2. Suppose there exists a straight line program P such that for x,a! € 
Zn holds that P(x!) AL and P(x) =L. Then there exists P; E P such that 
P;(x') € Z% and P;(x) ¢ Z}. 


Proof. P(x) =L means that there exists an SLP-step (i,j,0) € P such that 
o = / and L; = P;(x) ¢ Zš. However, P(x’) does not evaluate to L, thus it 


n 


must hold that P;(x’) € Z*. 
The following lemma provides a lower bound on the probability of factoring n 
by evaluating a certain straight line program P with y & U [C] and computing 
gcd(n, P(y)), relative to the probability that P(x’) ¢ Zt and P(x) € Z% for 
randomly chosen z, x’ & C. 


Lemma 3. For any straight line program P and C C Zn holds that 
Pr P(e’) g Z* and P(x) € Z% | a,a' & c] 


< (LE 


KEN p lecd(n, P(y)) ¢ (nk |y 2U [C]. 


Similar to the above, the following lemma provides a lower bound on the prob- 
ability of factoring n by computing ged(n, P(y) — Q(y)) with y © U [C] for two 
given straight line programs P and Q, relative to the probability Pr[(P(x) =n 
Q(z) and P(2') #n Q(2')) | 2,2" =C]. 
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Lemma 4. For any pair (P,Q) of straight line programs and C C Zn holds that 
Pr [P(x) =n Q(z) and P(2’) #n Q(x!) | x,a" =c] 


ugly? u 
<| r ) Pr[sca(n, Pi) - QU) ¢ {Ln} ly = Ula). 
The proofs of Lemma B] and {J are based on the Chinese Remainder Theorem. 
Full proofs are given in Appendix C and D of the full version [I3]. We also discuss 
the intuition behind these lemmas in Appendix E of [I3]. 


3 Subset Membership Problems in Generic Rings 


Definition 3 (Subset Membership Problem). Let C C Zn and V C Zn be 
subsets of Zn such that V C C C Zn. The subset membership problem defined 
by (C, V) is: given x & C, decide whether x € V. 


Whenever considering a subset membership problem in the following we assume 
that |V| > 1. 

Let (C, V) be subsets of Z, defining a subset membership problem. We formal- 
ize the notion of subset membership problems in the generic ring model in terms 
of a game between an algorithm A and a generic ring oracle Osmp. Oracle Osmp 
is defined exactly like the generic ring oracle described in Section ZJ except 
that Osmp receives a uniformly random element x & C as input. We say that A 
wins the game, if x € V and A=» (n) = 1, or z ¢ V and AC! (n) = 0. 

Note that any algorithm for a given subset membership problem (C,V) has 
at least the trivial success probability I7(C,V) := max{|V|/|C|,1— |V|/|C|} by 
guessing, due to the fact that x is sampled uniformly from C. For an algorithm 
solving the subset membership problem given by (C, V) with success probability 
Pr[S], we denote with 


Adve y) (AP? (n)) := [Pr[S] — 7 (C,V)| 
the advantage of A. 


Theorem 1. For any generic ring algorithm A solving a given subset member- 
ship problem (C, V) over Zn with advantage Adve y) (A9? (n)) by performing 
m queries to Osmp, there exists an algorithm B that outputs a factor of n with 
success probability at least 


Adv v) (4°=*»(n)) /_ e_y? 
2m(m? + 5m +3) u [C] | 

by running A once and performing O(m?) additional operations in Zn, m ged- 
computations on [log,n|-bit numbers, and sampling m random elements from 


u jc]. 
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Proof Outline. We replace Osmp with a simulator Osim. Let Ssim denote the 
event that A is successful when interacting with the simulator, and let F denote 
the event that Osim answers a query of A different from how Ogmp would have 
answered. Then Osmp and Osim are indistinguishable unless F occurs. There- 
fore the success probability Pr[S] of A in the simulation game is upper bound 
by Pr[Ssim] + Pr[F]. We derive a bound on Pr[Sgim] and describe a factoring 
algorithm whose success probability is lower bound by Pr[F]. 


3.1 Introducing a Simulation Oracle 


We replace oracle Osmp with a simulator Osim. Osim receives x “Cas input, but 
never uses this value throughout the game. Instead, all computations are per- 
formed independent of the challenge value x. Note that the original oracle Osmp 
uses x only inside the test() and equal() procedures. Let us therefore consider 
an oracle Osim which is defined exactly like Osmp, but replaces the procedures 
test() and equal() with procedures testsim() and equalsim(). 


— The testsim()-procedure samples 2, “ C and returns false if o = / and 
P;(x,) Z Z}, and true otherwise (even if P;(x,) =L). 

— The equalsim()-procedure samples x, © C and returns true if P;(£,) = 
P;(x,) mod n and false otherwise (even if P;(z,) =L or Pj(a,) =L). 


Note that the simulator samples m random values ær, r € {1,...,m}. Also note 
that all computations of A are independent of the challenge value x when inter- 
acting with Osim. Hence, any algorithm A has at most trivial success probability 
in the simulation game, and therefore 


Pr[Ssim] < (C, V). 


3.2 Bounding the Probability of Simulation Failure 


We say that a simulation failure, denoted F, occurs if Osim does not simulate 
Osmp perfectly. Observe that an interaction of A with Osim is perfectly indis- 
tinguishable from an interaction with Osmp, unless at least one of the following 
events occurs. 


1. The testsim()-procedure fails to simulate test() perfectly. This means that 
testsim() returns false on a procedure call where test() would have returned 
true, or testsim() returns true where test() would have returned false. Let 
Frest denote the event that this happens on at least one call of testsim(). 

2. The equalsim()-procedure fails to simulate equal() perfectly. This means that 
equalsim() has returned true where equal() would have returned false, or 
equalsim() has returned false where equal() would have returned true. Let 
Fequal denote the event that this happens at at least one call of equalsim(). 


Since F implies that at least one of the events Frest and Fequal has occurred, it 
holds that 
Pr[F] < Pr[Frest] + Pr[Fequail- 


In the following we will bound Pr[ Fest] and Pr[Fequai] separately. 
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Bounding the Probability of Frest- The testsim()-procedure fails to simulate 
test() only if either testsim() has returned false where test() would have returned 
true, or testsim() has returned true where test() would have returned false. A 
necessary condition for this is that there exists P; E P and x, € {a1,...,%m} 
such that 


(P(x) € Zi and Pj(x,) ¢ Zn) or (Pj(x) =L and P;(2,) ¢ Zz), 


or 
(Pj(a,) € Zy and Pj(x) ¢ ZF) or (Pj(a,) =L and Pj(x) ¢ Z7). 


We can simplify this condition a little by applying Lemma] The existence of 
P; E P and zx, such that (P;(z,) =L and P;(x) ¢ Z%) implies the existence 
of P, E P such that k < j and (P(x) ¢ Z% and P(x) € Zš). An analogous 
argument holds for the case (P;(z) =L and Pj(xzr) ¢ Z} ). Hence, testsim()- 
procedure fails to simulate test() only if there exists P; C P such that 


(Pj(x) € ZF, and P;(x,) ¢ Z;,) or (Pj(a,) € Zy and P;(x) ¢ Z% )- 
Proposition 1 


Pr[Ftest] < 2m(m-+ 2) max {Pr [Pi(e) g Z* and P;(x') € Z* | x, a Š cl} 


O<j<m 


We sketch the proof of Proposition [] in Appendix A full proof is given in 
Appendix F of the full version. 


Bounding the Probability of Fequa. The equalsim()-procedure fails to sim- 
ulate equal() only if either equalsim() has returned false where equal() would 
have returned true, or equalsim() has returned true where equal() would have 
returned false. A necessary condition for this is that there exist P;, P; E P and 
Ly € {£1,..., 2m} such that 


(Pi(z) =n Pj(x) and P;(£r) An Pj(zr)) 
or (P;(£) =n P(x) and (Pi(x,) =L or P;(z,) =L)) 
or (P;(£r) =n Pj (a) and P;(x) Én P;(x)) 

(Pil 


or (P;(£r) =n Pj(a-) and (P;(z) =L or P;(x) =L)). 


Again we can apply Lemma Ø] to simplify this a little: the existence of Pj € P 
and xr such that (P;(z,) =L and P;(x) AL) implies the existence of Py € P 
such that (Py(v,) ¢ Zš and P(x) € Z*). Analogous arguments hold for the 


2 The condition is not sufficient, because algorithm A need not have queried a division 
by P; in its r-th query. 

3 The condition is not sufficient, because algorithm A need not have queried (2,9, =) 
in its r-th query. 
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other cases where one straight line program evaluates to L. Hence, equalsim()- 
procedure fails to simulate equal() only if there exist P;, P; E P or Py E P such 
that 


P,(z) =n Pj(x) and P;(£r) #n Pj(zr)) 
r (P(t) =n P;(xr) and P;(x) Én P;(x)) 
zr) Z Z and P(x) € Zž) 
x) € Z* and P(£r) € Zs). 


Proposition 2 


Pr{Fequail < 2m(m? +3m+ 1)@ + 2m(m + lw, 


where 
SN 3 = i Lal (mpl 1 U 
= mo {Pr [Pi(z) =, P;(x) and Pi(2') Én Pj(2") | x, 2° — c| } 
v= oe, {Pr [Pa(2) g Z* and P,(a’) € Z* | 2, a' = c] } , 


The proof of Proposition B] which is based on the same ideas as the proof of 
Proposition] is given in Appendix G of the full version. 


Bounding the Probability of F. Summing up, we obtain that the total 
probability of F is at most 
Pr[F] < Pr[Frest] + Pr[Fequal] 
< 2m(m? +3m+1)6+ 4m(m + 1). 


where @ and W are defined as above. 


3.3 Bounding the Success Probability 


Since all computations of A are independent of the challenge value x in the 
simulation game, any algorithm has only the trivial success probability when 
interacting with the simulator. Thus the success probability of any algorithm 
when interacting with the original oracle is bound by 


IT(C, V) + Advc,y) (A?) = Pr[S] < Pilsan] + PriF] < IT(C, V) + Pr|F], 


which implies 
Adve y) (AP?) < PrF]: 


3.4 The Factoring Algorithm 


Consider a factoring algorithm 6 running A, recording the sequence of queries 
A issues, and proceeding as follows. 
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— Whenever the algorithm submits (i, j,o) with o € {+,-,-,/} in its r-th 
query, the algorithm samples y = UC] and computes gcd(P(y),n) for 
O0<k<r. 

— Whenever the algorithm submits (i, 7,0) with o € {=} in its r-th query, 
the algorithm samples y — U [C] and computes gced(P;(y) — P;(y),n) for 
-l<i<gj<r. 


Running time. By assumption, A submits m queries. Thus, the algorithm eval- 
uates O(m?) straight line programs. Each query can be evaluated by performing 
at most m steps, which yields O(m?) operations in Zn. Moreover, the algorithm 
samples m random values y from U [|C] and performs m gcd-computations on 
flog, n|-bit numbers. 


Success probability. 6 evaluates any straight line program P, with a uni- 
formly random element y of U |C]. In particular, 6 computes gcd(P,(y),) for 


y —U|C] and the straight line program P, C P satisfying 
Pr [Pe(a) g Z* and P(x") € Z% |£, x = c| 


= * 7 * 1 U 
= {Pr [Pa(a) g Zy and P(x) € ZF | aa’ — c| } f 
Let yı := maxo<k<m{Pr|P;(x) ¢ Z* and P(x’) € Z* | a,x’ & C]}, then 
by Lemma Bl algorithm 6 finds a factor in this step with probability at least 

2 

IC 

1 (ier) i 
Moreover, 6 evaluates any pair P;, P; of straight line programs in P with a 
uniformly random element y —& U [C]. So in particular B evaluates ged(P;(y) — 


P;(y),n) with y = U[C] for the pair of straight line programs P;, P; E P 
satisfying 


Pr o =, P,(x) and P;(x’) $n P;(2') | a,2’ Z c| 


= max {Pr [Pi(a) =p P;(x) and P,(2') £n P;(a') | z, x S c] } 


—1l<i<j<m 
Let 72 := max_1<i<j<m{Pr[P;(2) =n P;(x) and P;(x') #n P;(x') | 2,2’ — C]}, 
then by Lemma HJ algorithm B succeeds in this step with probability at least 
2 
Y2 (dh) . So, for y := max{71, Y2}, the total success probability of algorithm 


2 
* ( IC| ) l 
u [c] | 
Relating the success probability of B to the advantage of A. Using the 


above definitions of 71, y2, and y, the fact that Adve y) (4°? (n)) < Pr[F], 
and the derived bound on Pr|F], we can obtain a lower bound on y by 


B is at least 
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Advc,y) (AP? (n)) < Pr[F] < 4m(m + 1)q1 + 2m(m? + 3m + 1) 
< 2m(m? + 5m + 3)7, 


which implies the inequality 


z Adve y) (4°? (n)) 
T 2m(m? +5m+3) ` 


Therefore the success probability of B is at least 


Act), ( a 
2m(m? +5m4+3) \YU[Cl|) ` 


4 Computing the Jacobi Symbol with Generic Ring 
Algorithms 


Let us denote with QR,, C Zn the set of quadratic residues modulo n, i.e. 
QR, := {x € Z* | £ = y? mod n, y € Z3}. 


Let (x | n) denote the Jacobi symbol p.287] and let Jn := {x € Zn | (x | n) = 
1} be the set of elements of Zn having Jacobi symbol 1. Recall that QR, © Jn, 
and therefore given x € Zn\Jn it is easy to decide that x is not a quadratic 
residue by computing the Jacobi symbol. 

There exist simple efficient algorithms computing the Jacobi symbol in Zn 
without factoring n. These algorithms are not generic, cf. [23) p.288]. 


Theorem 2. Suppose there exist a generic ring algorithm A solving the subset 
membership problem given by (C,V) with C = Z and V = Jn with advantage 
Adve) (AP? (n)) by performing m ring operations. Then there exists an algo- 
rithm B finding a factor of n with probability at least 


Adve v) (AP (n)) 

2m(m? + 5m + 3) 
by running A once and performing O(m?) additional operations in Zn, m ged- 
computations on [log,n|-bit numbers, and sampling m random elements from 


Z5. 


Proof. The theorem follows by applying Theorem [Jand the fact that U [Zš] = 


Z, since 
(2 = (2N -= 
lu [c]| [Zal 
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5 The Generic Quadratic Residuosity Problem and 
Factoring 


Definition 4 (Quadratic Residuosity Problem). The quadratic residuosity 
problem is the subset membership problem given by C = J, and V = QR,,. 


Given the factorization of n, solving the quadratic residuosity problem in Z,, is easy, 
also for generic ring algorithms. Thus, in order to show the equivalence of generic 
quadratic residuosity and factoring, we have to prove the following theorem. 


Theorem 3. Suppose there exist a generic ring algorithm A that solves the 
quadratic residuosity problem in Zn with advantage Advice y)(AC(n)) by per- 
forming m ring operations. Then there exists an algorithm B finding a factor of 
n with probability at least 


Advic,y) (AP (n)) 
8m(m? + 5m + 3) 


by running A once and performing O(m?) additional operations in Zn, m ged- 
computations on flog, n|-bit numbers, and sampling m random elements from Z% . 


Proof. The cardinality |Jn| of the set of elements having Jacobi symbol 1 depends 
on whether n is a square in N. 


re o(n)/2, if n is not a square in N, 
"| @(n), ifn is a square in N, 


where @(-) is the Euler totient function p.24]. Note also that U [Jn] = U [C] = 
Z. Therefore it holds that |J,| = |C| > ¢(n)/2 and |U [C]| = |Z*| = o(n). Thus 


we can apply Theorem [I] using that 


(Gen) = (H) > (e) i 


6 The Generic Subgroup Decision Problem and Factoring 


Let n = pq and let G be a cyclic group of order n. Then there exists a subgroup 
Gp C G of order p. 


Definition 5 (Subgroup Decision Problem). The subgroup decision prob- 
lem is the subset membership problem (C, V) with C = G and V = Gp. 


Recall that any cyclic group of order n is isomorphic to the additive group of 
integers (Zn, +). Now, since we are going to consider generic algorithms, we may 
assume that the algorithm operates on the group G = (Zn, +), of course without 
exploiting any property of this representation H Assuming an oracle DH solving 


4 One may equivalently assume that the generic group oracle uses the group (Zn, +) 
for the internal representation of group elements. 
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the Diffie-Hellman problem in G, we observe that this operation corresponds 
to the multiplication in Zn. Hence, the group G together with oracle DH exhibits 
the same algebraic structure as the ring Zn. 

By the Chinese Remainder Theorem, the ring Zn is isomorphic to the direct 
product Zp x Zg. Let @: Zp X Zq — Zn denote this isomorphism. The subgroup 
Gp of G with order p consists of the elements Gp = {¢(p,0) | £p € Zp}. So for 
generic ring algorithms the subgroup decision problem can be stated as: given 
x E€ Zn, decide whether x = 0 mod q. 

In order to model the generic subgroup decision problem, consider an oracle 
Osap Which is defined exactly like the generic ring oracle described in Section ZJ 
except that it does not provide the operation /. Osap receives an element £ € Zn 
as input, where x is constructed as follows: sample (£p, £4) S Zp X Zq and bit 
b & {0,1} uniformly random, and let x := (£p, bxq). An algorithm can query 
the oracle for the (inverse) group operation by submitting a query (i, 7,0) with 
o € {+,—}. The Diffie-Hellman oracle is queried by submitting (i, 7,0) with 
oef} 

We say that the algorithm wins the game, if x € Gp and AÎ»aæ(n) = 1, or 
x£ ¢ Gp and A°=»(n) = 0. We define the advantage of an algorithm A solving 
the subgroup decision problem with probability Pr[{S] as 


na- (A)| 


Remark 1. If we would also allow to query the oracle for divisions (which cor- 
respond to an “inverse Diffie-Hellman oracle” in the above setting), then there 
would be a simple algorithm determining whether « € Gp by returning true iff 
division by « fails. Interestingly, we will show that there is no generic algorithm 
making similar use of a standard Diffie-Hellman oracle, unless factoring n is easy. 
Therefore a further consequence of the theorem presented in the following section 
is that a standard Diffie-Hellman oracle does not imply a inverse Diffie-Hellman 
oracle in general, unless factoring is easy. 


Adv(. ACs (n)) := 


Remark 2. The subgroup decision problem was introduced in [b| for groups with 
bilinear pairing. Essentially such a pairing can be added to the generic model by 
allowing the algorithm to perform a single multiplication operation when eval- 
uating the bilinear pairing map as done in H]. By providing a Diffie-Hellman 
oracle, we do not restrict the algorithm to a fixed number of multiplications. 
Hence, our proof includes the problem stated in [b| as a special case. 


6.1 Generic Equivalence to Factoring 


In the sequel we show that solving the subgroup decision problem in groups of 
order n is as hard as factoring n, even if the algorithm has access to an oracle 
solving the Diffie-Hellman problem. 


5 Plus some minor technical details to distinguish between different groups. 
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Theorem 4. Suppose there exist a generic ring algorithm A solving the sub- 
group membership problem in G with advantage Adv(A°:*»(n)) by making m 
queries to an oracle performing the (inverse) group operation and solving the 
Diffie-Hellman problem. Then there exists an algorithm B finding a factor of n 
with probability at least Adv(A°=»(n)) by running A once and performing O(m?) 
additional operations in Zn and m gcd-computations on [logs n|-bit numbers. 


Proof. Let us consider an interaction of A with an oracle O, which is defined as 
follows. O, works similar to Osap, but performs all computations in Zp. That is, 
the equal()-procedure returns true on input (i, j) iff P;(x) = Pj(x) mod p. Note 
that now all computations are performed in the Z,-component of the decompo- 
sition Zp X Zq of Zn, hence the algorithm receives no information on whether 
x = 0 mod q. Thus in the simulation game any algorithm has only trivial success 
probability Pr[Ssim] = 1/2 + 1/4. 

Now consider an interaction of A with oracle Osap. Either this interaction 
is indistinguishable from an oracle Op, in which case the algorithm has only 
trivial success probability, or there exist P;,P; E P with such that P;(x) = 
P;(x) mod p, but P;(x) # Pj(x) mod n. In this case a factor of n is found by 
computing ged(P;(x) — P;(x),n). Note that 


1 
zt Adve y) (A= (n)) < Pr[Ssim] + Pr[F] 
<=> Adve y (A (n)) < Pr[F] 


Thus, n is factored this way by running A, recording P and computing 
gcd(P;(x) — Pj(2),n) 
for all —1 < i < j < m with probability at least Adve y) (A" (n)). 


The above proof generalizes from n = pq to n = Ma p; for all subgroups with 
prime-power order p;* in a straightforward manner. 


7 Analyzing Search Problems in the Generic Ring Model 


In Section] we have constructed a simulator for a generic ring oracle for the ring 
Zn. When interacting with the simulator, all computations are independent of 
the secret challenge value x. Therefore we have been able to conclude that any 
generic algorithm has only the trivial probability of success in solving certain 
decisional problems (namely the considered subset membership problems) when 
interacting with the simulator. Moreover, we have shown that any algorithm 
that can distinguish between simulator and original oracle can be turned into a 
factoring algorithm with (asymptotically) the same running time. 

In contrast to decisional problems, where the algorithm outputs a bit, our 
construction of the simulator can also be applied to prove the generic hardness 
of search problems where the algorithm outputs a ring element or integer. Let 
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us sketch two possibilities. The first one is to formulate a suitable subset mem- 
bership problem which reduces to the considered search problem and then apply 
Theorem [J] Another possibility is to use our construction of the simulator to 
bound the probability of a simulation failure relative to factoring. In order to 
bound the success probability in the simulation game, it remains to show that 
there exists no straight line program solving the considered problem efficiently 
under the factoring assumption. 


Acknowledgements. We would like to thank Andy Rupp and Sven Schage for 
helpful discussions, and Yvo Desmedt and the program commitee members for 
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A Proof Sketch for Lemma B] 


For notational convenience, let us define I (P) := Pr[P(a') ¢ Z and P(x) € 
Z| 2,2’ & C] and A(P) := Prigcd(n, P(y)) g {Ln} | y UCI]. Thus, in 
order to prove Lemma] we have to show that the inequality 


wicl|\? 
(44) A(P) > T(P) (1) 


holds. To this end, we will define an auxiliary function v;(P). Then we express 
T(P) and A(P) in terms of v;(P). More precisely, we will upper bound I (P) by 
an expression in 1;(P) and lower bound A(P) by an expression in v;(P). The 
resulting inequality is proven easily by complete induction. 
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Defining an auxiliary function. Recall that we denote with n = JJ; p“ 
the prime factor decomposition of n. Let 


vi(P) := Pr [P() = 0 mod p; | x = U [C] 


be the probability that P(x) = 0 mod p; for some prime p; dividing n and z © 
U [C]. Recall that ¢ : Zp x---xZ per — Zn is a ringisomorphism, and P performs 
only ring operations in Zn. Thed e P implicitly performs all operations cu each 
component Z, separately (and independently). Moreover, sampling x = uc] 
is equivalent to sample ¢(x1,..., £k) with x; chosen independently and uniform 
from C; for 1 < i < k (cf. Lemma [}. Thus we can express the probability that 
P(x) € Z* for x & U [C] as 


Bounding I(P) in terms of v;(P). For independently sampled 2,2’, we 
have 


T(P) = Pr [P(e N g Z$ and P(x) € Z* | 2,2’ €c] 
= Pr [P(2) g Tr, |æ =c] -Pr [P(@) € Z |æ =c] 
Note that, since C CU |C], it holds that 
Pr [P(@) eZ | zc] < Pr [Pw) eZ ly =u [cl] EN 


and similarly 


U U (e 
Pr [P(2) ¢ Z} |a=c] < (1-Pr[P(y) €Z} |y =u (cl]) EL. 
Therefore we can conclude that 


T(P) < Pr [Po ez; ly =u el] (1- Pr [Pw ez: y tuel) (MEL) 


-fje -ne (1- [a ntey) (EY. 2) 


i=l 


Bounding A(P) in terms of v;(P). We can find a factor of n by computing 
gcd(n, P(y)), if P(y) = 0 mod p; for at least one prime p; dividing n, and P(y) 4 
0 mod n. Using similar arguments as above, we can therefore express A(P) in 
terms of v;(P) as 
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A(P) = Pr [ged(n, Ply) ¢ {Ln} ly = C] 
k 


k 
=1- [[uP)- [0 - P) (3) 


i=l 


Putting things together. Combining () and @), we see that (J) holds if 


k 2 k 
(: =| [a= vi) > Tu) 


i=1 


holds, which is shown easily by complete induction on k > 2. 


B Proof Sketch for Proposition [M 


If there exists P; such that (P;(z) =L and P;(x,) #1), then this implies that 
there exists P, E P with k < j such that (P;(z,) ¢ Z} and P;(x) € Z*) 
by Lemma Hence, in order to bound the probability of Frest, it suffices to 
consider the probability that there exists a straight line program P; E P such 
that 


(P)(ar) € Za and P;(a) € Z) or (Pj(x) g Z, and P;(tr) € ZR) (4) 
for £,21,...,2m € C. 
By (essentially) applying the union bound we can see that for fixed Pj this 
probability is bounded by 
2m Pr [Pi() g Z* and P;(2’) € Z* | x, g & c] ; 


Using this, we obtain the following bound on the probability that there exists 
any Pj E P satisfying @. 


Pr| Frest] < 2mX Pr |P, (x) ¢ Z* and P;(a’) € Z* | 2,0 & c| 
j=0 


< 2m(m +1) (rax {Pr [Pi(e) g Z* and P;(z') € Zš | 2,2! Š c| } 
<jgm 
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Abstract. We revisit previous formulations of zero knowledge in the random 
oracle model due to Bellare and Rogaway (CCS ’93) and Pass (Crypto ’03), and 
present a hierarchy for zero knowledge that includes both of these formulations. 
The hierarchy relates to the programmability of the random oracle, previously 
studied by Nielsen (Crypto ’02). 


— We establish a subtle separation between the Bellare-Rogaway formulation 
and a weaker formulation, which yields a finer distinction than the separation 
in Nielsen’s work. 

— We show that zero-knowledge according to each of these formulations is not 
preserved under sequential composition. We introduce stronger definitions 
wherein the adversary may receive auxiliary input that depends on the 
random oracle (as in Unruh (Crypto ’07)) and establish closure under 
sequential composition for these definitions. We also present round-optimal 
protocols for NP satisfying the stronger requirements. 

— Motivated by our study of zero knowledge, we introduce a new definition of 
proof of knowledge in the random oracle model that accounts for oracle- 
dependent auxiliary input. We show that two rounds of interaction are 
necessary and sufficient to achieve zero-knowledge proofs of knowledge 
according to this new definition, whereas one round of interaction is 
sufficient in previous definitions. 

— Extending our work on zero knowledge, we present a hierarchy for circuit 
obfuscation in the random oracle model, the weakest being that achieved in 
the work of Lynn, Prabhakaran and Sahai (Eurocrypt ’04). We show that the 
stronger notions capture precisely the class of circuits that is efficiently and 
exactly learnable under membership queries. 


Keywords: zero-knowledge, random oracle model, sequential composition, 
obfuscation. 


1 Introduction 


The random oracle (RO) model, introduced by Fiat and Shamir and refined by 
Bellare and Rogaway [B], was proposed as a framework for designing and analyzing 
cryptographic schemes that offers a trade-off between provable security and practical 
efficiency. In this model, every party has oracle access to a truly random function. 
With this additional functionality, many cryptographic problems admit more efficient 


* Work done while visiting Tsinghua University, Beijing, China. 
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solutions than in the standard model, along with considerably simpler proofs of 
security BABATI]. In practice, the idealized random function is instantiated using a 
“good” cryptographic hash function, like SHA-1 or a variation thereof. There are also 
cryptographic problems for which we have partial solutions in the random oracle model 
but not in the standard model, most notably that of circuit obfuscation [H920]. In 
both cases, proofs in the random oracle model do not guarantee security or feasibility in 
the standard model (and in fact, there has been substantial evidence to the contrary 
BSB); nonetheless, the model provides a useful idealized test-bed for analyzing 
cryptographic schemes. 

As a first step towards establishing security, it is necessary to define security in 
the random oracle model. A naive extension of a definition in the standard model 
may affect the semantics of the underlying notion of security. Consider the case of 
zero-knowledge proofs, namely proofs that yield no knowledge beyond the validity of 
the assertion proved |I]. Formally, an interactive protocol is zero-knowledge if there 
exists a simulator that can simulate the behavior of every, possibly cheating, verifier 
without access to the prover, such that its output is indistinguishable from the output 
of the verifier after having interacted with the honest prover. In the standard model, 
a zero-knowledge proof is necessarily deniable, in that the protocol’s transcript does 
not constitute any evidence of the interaction, since any party could have generated the 
transcript by himself. However, the Bellare-Rogaway formulation of zero-knowledge 
in the random oracle model does not imply deniability, since the simulator can choose 
the random oracle (22{2 I]. In particular, the formulation allows for (non-trivial) one- 
round zero-knowledge proof systems, and the transcript of such a protocol constitutes 
evidence of participating in the protocol, contradicting deniability. 

In this work, we revisit two aspects of formulating zero-knowledge in the random 
oracle model. The first relates to defining security in the random oracle model and 
in particular, what it means to choose the random oracle, an issue first addressed by 
Nielsen [21]. The second relates to a different aspect of zero-knowledge proofs, namely, 
we want the zero-knowledge guarantee to hold even if the verifier may have some 
additional a priori information about the input. The need to account for such auxiliary 
input, which arises in typical applications such as sequential repetitions of a protocol, 
was articulated in the work of Goldreich and Oren |I and again in that of Unruh 
(25). While the Bellare-Rogaway formulation of zero-knowledge does take into account 
auxiliary input, it does not allow for dependencies between the auxiliary input and the 
random oracle, which arise for instance, when the auxiliary input is a transcript of a 
previous interaction using the oracle. 


1.1 Programmability in the Random Oracle Model 


There are two reasons why, in the simulation-based paradigm, it is easier to achieve 
security in the random oracle model: 


— the simulator can see the queries parties make to the random oracle; 
— the simulator can choose the answers to these queries. 


The second is what we refer to as programming the random oracle, and may be qualified 
in several different ways. Suppose our goal is to simulate a transcript RO(s), namely 
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the evaluation of the random oracle RO at some value s. Our intuition about the 
random oracle as a truly random function indicates that picking a truly random string 7 
should suffice (essentially choosing the evaluation of RO at s to be 7), and indeed, no 
distinguisher - even computationally unbounded ones - can distinguish a truly random 
string from RO(s), provided the distinguisher does not get access to RO. On the other 
hand, if we give the distinguisher access to RO, then the only “good” simulation of 
the transcript is RO(s), and the simulation must query RO at s. This is because the 
distinguisher may have s hardwired into it, then queries RO at s and checks whether 
the answer matches the transcript. In this setting, the simulator does not get to choose 
the answers to oracle queries. To distinguish between these two notions of security, 
we will refer to the former as the fully programmable random oracle (FPRO) model, 
and the latter as the non-programmable random oracle (NPRO) model (as coined by 
Nielsen [2I]. 

In the case where we allow the simulator to choose the answers to oracle queries, 
we may still impose an additional requirement, namely that the simulator must output 
its choices of these query/answer pairs. In the above example, whether the simulator 
chooses the output of RO at s to be some random string 7, its output will include the 
transcript 7, along with the list (s, 7), corresponding to the query s and answer 7. This 
is in fact the notion of programmability raised by Bellare and Rogaway for zero- 
knowledge, and we will refer to this as the explicitly programmable random oracle 
(EPRO) model. We defer a precise definition to the body of the paper, but note at this 
point that security in the non-programmable random oracle model (strongest security 
guarantee) implies security in the explicitly programmable random oracle model, which 
in turn implies security in the fully programmable random oracle model (weakest 
security guarantee). 


1.2 Our Contributions and Techniques 


Hierarchy for zero knowledge. We begin with a simple and unified framework for 
defining zero knowledge in the three variants of the random oracle model, and then 
present a (perhaps surprising) separation for zero knowledge in the fully programmable 
and explicitly programmable random oracle models. This yields a finer separation than 
that in Nielsen’s work [2I], and complements Pass’s separation for zero knowledge in 
the explicitly programmable and non-programmable random oracle models. 


Auxiliary input and sequential composition. Following the work of Goldreich et al. 
for zero-knowledge in the standard model, we use closure under sequential 
composition as a yardstick for evaluating formulations of zero-knowledge. We show 
that zero-knowledge in all three variants of the random oracle model are not closed 
under sequential compositior_]. This motivates a new formulation of zero-knowledge 
in the random oracle which allows for auxiliary inputs that depend on the oracle, as 


' That this may be the case has been previously noted (e.g. [22I]), but to our knowledge, there 
has been no formal (published) proof. 
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was done in for one-way functions, encryption and other primitives We show that 
for efficient-prover protocols, zero-knowledge with oracle-dependent auxiliary input in 
the explicit-programmable and non-programmable random oracle models are preserved 
under a polynomial number of sequential repetitions. We also present round-optimal 
protocols for NP satisfying the new formulations of zero-knowledge. 


Proofs of knowledge. Our constructions demonstrating that previous formulations of 
zero knowledge are not closed under sequential composition implicitly rely on a non- 
interactive zero-knowledge “proofs of knowledge” in the random oracle model. Specif- 
ically, non-interactive protocols are necessarily malleable (without unique identifiers), 
and the cheating verifier can generate a convincing proof of knowledge by copying 
one sent by the prover in a previous iteration of the protocol. This motivates a new 
formulation of proof of knowledge in the random oracle model that takes into account 
oracle-dependent auxiliary input. We show that two rounds of interaction are necessary 
and sufficient to achieve zero-knowledge proofs of knowledge according to this new 
definition. 


Circuit obfuscation. We extend our framework for programmability to circuit obfusca- 
tior{] in the random oracle model [9f], and note that the obfuscator constructions of 
Lynn et al. [T9] achieve security in the fully programmable random oracle model. Next, 
we show circuit obfuscation in the explicit-programmable random oracle model can 
only be realized for classes of circuits that are efficiently and exactly learnable under 
membership queries, and for these classes, obfuscation may be (trivially) realized in 
the plain model, so the characterization is exact. We find it surprising that we can have 
non-trivial constructions in the explicitly programmable model for zero knowledge but 
not for circuit obfuscation. 


1.3 Discussion 


Formulating zero-knowledge. A general framework for defining security in the random 
oracle model was presented by Nielsen [ŽI], based on augmenting the universally 
composable (UC) framework with a random oracle functionality. This guarantees 
composability. As pointed out by Pass [22], deniability is not guaranteed in this 
framework. Nielsen also defined security with a non-programmable random oracle, 


> For the primitives considered in [25], the random oracle is typically only used in the proof 
of security. Specifically, Unruh does not explicitly address primitives with a simulation- 
based notion of security, which is the focus of this work and where the random oracle is also 
exploited in constructing a simulator. On the other hand, Unruh considers a stronger notion 
of oracle-dependent auxiliary input, where a polynomial bound is imposed only on the output 
length of the machine generating the auxiliary input and not its query complexity. 

3 We use the term obfuscation to refer to the stronger notion of obfuscation against general 
adversaries, instead of obfuscation against predicate adversaries [26]. In the standard model, 
only classes of circuits that are efficiently and exactly learnable under membership queries 
are obfuscatable against general adversaries [26]. The result also extends to the fully- 
programmable RO model. 
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where the environment in the UC framework is also given access to the random oracle. 
This offers deniability, but may no longer guarantee universal composability. 

Instead of adopting Nielsen’s formulation, we consider a minimal framework (based 
on BHA) for which we can provide the weaker guarantee of sequential composition. 
The simplicity allows us to focus on how the random oracle is incorporated differently 
in each of (2IB—22]. In addition, it offers several conceptual advantages: it offers 
modularity (which allows us to decouple the zero-knowledge property from the proof- 
of-knowledge property and other properties implied by UC zero-knowledge, and for 
impossibility results, these distinctions are particularly important) and reinforces the 
theme of this work, that of understanding how semantics can change between the 
standard model and the random oracle model. Furthermore, our framework is simple 
enough to be applied to circuit obfuscation, for which we have very few non-trivial 
positive results, let alone constructions that compose arbitrarily (which probably only 
exist for trivially obfuscatable families of circuits). 


Sequential composition not the end-goal. We recall the arguments used in [4] to 
motivate the study of auxiliary-input zero-knowledge: first, it fully captures the intuitive 
meaning of the concept of zero-knowledge; and second, this stronger requirement is 
necessary when a zero-knowledge protocol is used as a sub-protocol within larger 
cryptographic protocol. It is for these same reasons that we pursue a formulation 
of zero-knowledge in the random oracle model that incorporates auxiliary input (refer 
to for additional arguments). Indeed, we regard our sequential composition lemma 
as evidence that we have properly accounted for auxiliary input in our formulation 
and not a goal in and of itself. Similarly, constructing protocols for NP that remain 
zero-knowledge under sequential composition should not be an objective in itself] 
Neither should a generic method for transforming protocols that are zero-knowledge 
into another that remain zero-knowledge under sequential composition. 


On “explicit programmability”. From previous work [BI22{19[26], we know that 
allowing the simulator to program the random oracle is necessary and sufficient for 
one-round zero-knowledge protocols for NP and obfuscating point functions in the 
random oracle model. However, while explicit programmability is sufficient for zero- 
knowledge, we show that full programmability is necessary for the latter. This means 
that the reason we are able to realize non-trivial circuit obfuscation in the random oracle 
model comes not only from programming the random oracle, but also from not having 
to specify explicitly how we program the random oracle. 

The issue of explicit programmability also arises in the study of sequential composi- 
tion. To obtain zero-knowledge that is closed under a polynomial number of sequential 


4 One may ask, why not aim for universal composability then? This is addressed in the previous 
paragraph, and as with previous work in the standard setting, we feel that zero-knowledge w.r.t. 
auxiliary input is indeed the right compromise. 

> All “natural” zero-knowledge protocols for NP in the RO model (in the {B} sense) remain zero- 
knowledge under sequential, even concurrent, composition, but this does not obliterate the 
need for the “right” definition. After all, when auxiliary-input zero-knowledge was introduced, 
all known zero-knowledge protocols were black-box and therefore remained zero-knowledge 
under sequential composition. 
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compositions, it appears that explicit programmability is necessary, in addition to 
properly accounting for auxiliary input. 


Self-composition for circuit obfuscation. Lynn et al. introduce self-composition for 
circuit obfuscation [19], a notion of composition analogous to sequential composition. 
In addition, they give an obfuscator for point functions in the random oracle model 
that is not 2-self-composing. This is because the construction is not a valid obfuscator 
w.r.t. dependent auxiliary input. To obtain polynomial self-composition for obfuscation 
using techniques in this work, we will need a definition that incorporates both oracle- 
dependent auxiliary input and explicit programmability. 


2 Preliminaries 

A negligible function is a function of the form n“, and is denoted neg(n). We use 
PPT as an abbreviation for a probabilistic (strict) polynomial-time Turing machine. We 
also consider the nonuniform and oracle analogues, which we denote by nonuniform 
PPT and oracle PPT respectively. In probability expressions that involve a probabilistic 
computation, the probability is also taken over the internal coin tosses of the underlying 
computation. We refer the reader to [IJ] for definitions of interactive proof systems, 
zero-knowledge, proofs of knowledge and witness-indistinguishability (WI) in the 
standard model. For a relation R C {0,1}* x {0,1}*, the language associated with 
Ris Lr = {x: Jy (x,y) € R}. 


3 Zero Knowledge in the Random Oracle Model 


In this section, we present our hierarchy of formulations for zero knowledge in the RO 
model, along with those that account for oracle-dependent auxiliary input. We begin 
with several formalisms we will use in defining zero knowledge: 


— We use RO to denote the random oracle and £ to denote an oracle that returns the 
empty string on all inputs. 

— Given a function f : {0,1}* — {0,1}* and a list 2 C {0,1}* x {0,1}*, we use 
f[@ to denote a function that agrees with f everywhere except on inputs specified 
by the set £. Specifically, 


o Jy if J!y such that (x, y) € 4 
Fata) = ve otherwise 


Informally, we refer to f[¢] as the function obtained by programming f on the 
inputs in Z. In the definition of zero-knowledge, the simulator generates a pair (7, £): 
the simulator programs the random oracle on the inputs in @, and 7 corresponds to 
the view of the cheating verifier while interacting with the prover using the oracle 
RO[4). 
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— We allow the auxiliary input to be generated by a nonuniform oracle PPT Z 
(the nonuniformity allows for auxiliary information that may depend on the input 
instance) which we refer as the auxiliary input machine. We will give Z oracle 
access to either € or RO, depending on whether we allow the auxiliary input to 
depend on the random oracle. 


Definition 1 (zero-knowledge [B/22]). Let (P,V) be an interactive protocol for a 
language L = Lp. Let V*,S,Z,D be oracle PPTs. Given (x,w) € R, we define 


diffe oe ate: w) to be the quantity 


Pr[z ~ goral + (P (u), V**(z))(2); D2(@,7,z) = 1] 


-Prie Zag (7,4) — e D*(2,7, z) = 1] 


We say that (P,V) is zero-knowledge in the fully programmable random oracle 
model (FPRO) if for every oracle PPT V*, there exists an oracle PPT S such that for 
all (x,w) € R and for all nonuniform oracle PPTs Z and D, diffy;-"5 z p(a,w) is 
negligible (as a function of |x|). In addition, we obtain zero-knowledge in the: 


explicitly programmable RO (EPRO) model if O, = £, O2 = RO, O3 = RO[E] 
non-programmable RO (NPRO) model if O1 = £, O2 = RO, O3 = RO 
FPRO model w.r.t dependent auxiliary input ifO, = RO, O2 = £, O3 = € 
EPRO model w.r.t dependent auxiliary input if O1 = RO, O2 = RO, O3 = RO[é] 
NPRO model w.r.t. dependent auxiliary input if O1 = RO, O2 = RO, O3 = RO 


For simplicity, we will also refer to the respective notions of zero-knowledge as FPRO 
zero-knowledge, EPRO zero-knowledge, NPRO zero-knowledge, auxiliary-input FPRO 
zero-knowledge, auxiliary-input EPRO zero-knowledge, and auxiliary-input NPRO zero- 
knowledge. 


Zero-knowledge in the FPRO model. This definition captures the weakest requirement, 
in that the simulator may choose the random oracle in the simulated transcript, as 
long as it “looks” random. We point out that we require simulating the output of the 
cheating verifier, but not the random oracle used in the simulated transcript. This 
is equivalent to a definition wherein the simulator S is given access to RO. Since 
the distinguisher does not have access to RO, the simulator can simply generate 
a random oracle by itself, so giving the simulator access to RO does not give the 
simulator any extra power. Note that this definition also constitutes a relaxation 
of the UC framework augmented with a random oracle functionality (namely, 
that obtained by replacing the interactive environment with a non-interactive 
distinguisher) [ZIB]. 


Zero-knowledge in the EPRO model. The main qualitative difference between FPRO 
zero-knowledge and EPRO zero-knowledgd] is that the simulator is required to 
completely specify a simulated random oracle (namely RO{¢]) in the latter, which 


ê Indeed, making this distinction in the UC framework would require clumsy modifications. 


424 H. Wee 


the distinguisher is given access tof] We require that S specifies £ explicitly, which 
implies a polynomial bound on the size of ¢. On the other hand, the oracle RO|[¢] is 
specified implicitly. EPRO zero-knowledge is equivalent to the Bellare-Rogaway 
formulation, except the latter does not give the simulator oracle access to RO. 
As with zero-knowledge in the FPRO model, this does not make any qualitative 
difference as the simulator can simply generate random answers to the RO queries 
and add these query-answer pairs to the list £. 


Zero-knowledge in the NPRO model. Here, the simulator is not allowed to choose the 
random oracle in the simulated transcript. This implies deniability, and is equivalent 
to Pass’s formulation [22]. It is a special case of EPRO zero-knowledge with £ = Ó. 
For efficient-prover protocols, the NPRO zero-knowledge requirement is equivalent 
to requiring that the following quantity be negligible [P28]: 


Evo | |Prlz — Z9 (117); D(x, (P*(w), V*"°(z))(a), 2) = 1] 
— Pr|z — 7911/21), D™ (a, S®° (x, z), z) = 1]| 


This is also true w.r.t. dependent auxiliary input. 


Incorporating dependent auxiliary input. Incorporating dependent auxiliary input 
provides some guarantee of “independence” between the queries made to the 
random oracle in the protocol and prior queries, even though we do not know 
what the prior queries are. To achieve this definition, we construct simulators that 
program the random oracle on inputs that have not been previously queried by Z 
(here, we exploit the polynomial bound on the query complexity of 7). Unlike the 
case without auxiliary input, it is essential that we provide the simulator for zero- 
knowledge and EPRO zero-knowledge with oracle access to RO so that the simulator 
may generate transcripts that are consistent with the output from Z. 


Verifier’s view. A common convention in defining zero-knowledge in the standard 
model is to use (P®°(w), V*®°(z)) (a) to denote the view of the verifier V*, which 
consists of the protocol’s transcript and the verifier’s random tape, instead of the 
output of the verifier. This is because we may incorporate the computation of the 
output from the view into the distinguisher. This argument does not necessarily 
apply to definitions in the RO model. In this case, the distinguisher does not have 
access to RO and may not be able to compute the output from the view Therefore, 
we reserve (P®°(w), V*®°(z))(a) to denote the output of the verifier. 


A note on black-box simulation. As with previous works on zero knowledge in the RO 
model, we will establish the zero-knowledge property via black-box simulation, 


7 For some secret value s and a random RO, we may easily simulate a view of RO(s) with a 
random string. However, in order to simulate a view of RO(s) along with an oracle that is 
consistent with this view, we will need to either query RO at s or program RO at s; either 
operation requires “knowing” s. 

8 Simply requiring that the verifier’s query/answer pairs be included in its view may not be 
sufficient as we may also need the prover’s query/answer pairs. 


Zero Knowledge in the Random Oracle Model, Revisited 425 


except we will allow the simulator to see the oracle queries made by the cheating 
verifier. This is consistent with the definition of zero knowledge because the 
simulator can execute the code of the cheating verifier and observe the oracle 
queries made during the executions. This is a crucial advantage over mere black- 
box simulation of the cheating verifier in the standard model. On the other hand, 
we do not allow the simulator to see the oracle queries made by Z. Consider a 
typical application, namely that of sequential repetitions of the protocol. Here, the 
auxiliary input is a transcript from previous executions of the protocol and may 
therefore depend on the oracle RO. The cheating verifier receives the transcript, 
but does not gain access to the private coin tosses used to generate the transcript. 
The distinction arises from the fact that we allow the simulator to depend on the 
cheating verifier but not on Z. 


4 Zero-Knowledge Protocols and Separations 


Several constructions of zero-knowledge protocols for NP in the RO model were 
given in BPZ]. It is straight-forward to verify that the zero-knowledge protocol in 
[3 is also auxiliary-input EPRO zero-knowledge. In an unpublished work [23], Pass 
determined the round-complexity of auxiliary-input NPRO zero-knowledge protocols 
for NP. We summarize these results below: 


Theorem 1 (protocols [3/22/23]). Assuming the existence of one-way functions, there 
exist: 


— aone-round proof of knowledge protocol for NP that is auxiliary-input EPRO zero- 
knowledge (moreover, we may assume that the knowledge extractor is straight-line 
and runs in strict polynomial time is 

— atwo-round protocol for NP that is NPRO zero-knowledge; and 

— a3-round protocol for NP that is auxiliary-input NPRO zero-knowledge. 


Furthermore, each of these protocols has perfect completeness, negligible soundness, 
and an efficient prover. 


Theorem 2 (triviality [22)23]). Only languages in BPP have a one-round NPRO zero- 
knowledge protocol or a 2-round auxiliary-input NPRO zero-knowledge protocol. 


We outline the proofs in [23]. The 3-round auxiliary-input NPRO zero-knowledge 
protocol for NP is based on the 2-round NPRO zero-knowledge protocol in [22] except 
we have the prover pick a random prefix q in the first round, and prepend a to all 
prover’s and verifier’s queries to the random oracle. The proof of Theorem [J] follows 
essentially from the fact that the proofs of the analogous statments in the standard model 
[A] relativizes. 

Next, we state our first result, separating FPRO and EPRO zero-knowledge. 


Theorem 3. Assuming the existence of one-way permutations, there exists a protocol 
that is auxiliary-input FPRO zero-knowledge but not EPRO zero-knowledge. 
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————————_——_ $ $ 
FPROZK : : EPROZK = <= __NPRO ZK 


Fig. 1. Relations between different variants of zero knowledge in the RO model, assuming the 
existence of one-way permutations. An arrow is an implication, and a crossed arrow indicate 
separation. We stress that the relations refer to protocols satisfying the respective notion of zero- 
knowledge. 


Proof. Let m be a one-way permutation, and consider the following protocol for the 
relation R = {(z,w) | £ = m(w)}, where Lr = {0,1}* (note that soundness holds 
vacuously): 


Common input: An instance x € {0,1}”. 
Prover’s private input: A witness w € {0,1}”. 
P — V: Sends a  {0,1}". 


V = P: Sends r  {0,1}". 
P—V:If7 = RO(a o w), send w; else, send RO(a o w). 
verification: V always accepts. 


To see that this protocol is auxiliary-input FPRO zero-knowledgd), fix a cheating 
verifier V* (along with its random tape and an auxiliary input z from Z), pick a random 
a, and simulate the execution of V*, forwarding the oracle queries made by V* to 
RO, until we obtain its first message 7. During the simulation, we also check if any 
of V*’s queries matches œ o w (which we can check efficiently given x). If so, we 
would have recovered w, and may successfully compute the output of V*. If we do 
not manage to recover w, we simulate the prover’s response with a random string T’ € 
{0,1}” and continue to simulate the execution of V*, forwarding all oracle queries 
to RO, unless the query matches a o w, in which case we respond with 7’. This is ok 
because with probability 1 — neg(n) over a, none of the queries made by Z has prefix 
a. This completes the description of the zero-knowledge simulator. 

Suppose on the contrary that the protocol is EPRO zero-knowledge, and consider the 
simulator S that outputs the view of the honest verifier. Fix x € L, and consider a 
distinguisher with w = m~! (x) hardwired into it. Then, S must output a transcript that 
contains RO[¢](a o w) with probability 1 — neg(n). For the latter, S must with high 
probability, either query RO at a o w or output a list £ that contains the string a o w. In 
both cases, we may derive a PPT that on input x, outputs 7~!(a) with high probability, 
which contradicts 7 being one-way. 


° Informally, the prover uses (œ, RO(a o w)) to check whether the verifier already “knows” w. 
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5 Sequential Composition Fails without Dependent Auxiliary 
Input 


In this section, we present zero-knowledge protocols which are no longer zero- 
knowledge when executed twice sequentially. The protocols are similar in spirit to 
that in [9], a zero-knowledge protocol in the standard model that is no longer zero- 
knowledge when executed twice in parallel. The protocols exploit zero-knowledge 
proofs of knowledge (which may be realized non-interactively in the random oracle 
model), using these proofs as the auxiliary input which a cheating verifier could use 
to “gain knowledge”. Specifically, the prover will send the verifier a zero-knowledge 
proof of knowledge of the witness, and the cheating verifier will copy this proof to 
“claim” knowledge of the witness. The apparent contradiction arises from a problem 
in the definition of proofs of knowledge in the random oracle model, an issue we will 
address in Section [f] 


Theorem 4. Assuming the existence of one-way functions, FPRO zero knowledge, 
EPRO zero knowledge, and NPRO zero knowledge are not closed under sequential 
composition. 


Proof (sketch). We begin by constructing an EPRO zero-knowledge protocol that is 
no longer zero-knowledge when composed twice. The protocol is for the language 
L corresponding to the relation R = {(x,w) | x = f(w)}, where f is a one-way 
function, and we use as an underlying protocol a one-round EPRO zero-knowledge proof 
of knowledge protocol (from Theorem[]). 


Common input: An instance x € {0,1}”. 
Prover’s private input: A witness w € {0,1}”. 
V — P: Send a random string T. 
P — V: If 7 is an EPRO zero-knowledge proof of knowledge that x € L, 
send w; else, send an EPRO zero-knowledge proof of knowledge 
that x € L. 


verification: V accepts if it receives either w such that f(w) 
accepting proof of knowledge that x € L. 


To prove EPRO zero-knowledge, the simulator runs the cheating verifier to obtain the 
first message 7. If 7 is an accepting proof of knowledge for x € L, the simulator runs 
the knowledge extractor to obtain a valid witness w. Otherwise, the simulator runs the 
zero-knowledge simulator for the underlying zero-knowledge protocol to generate the 
second-round message. We would actually require that the underlying zero-knowledge 
protocol be auxiliary-input EPRO zero-knowledge, which is ok. 

To see that this protocol is not zero-knowledge when composed twice, consider the 
cheating verifier V* that sends a random string in the first execution, and sends the 
prover’s response as its first message in the second execution. For all x € L, the 
transcript between the honest prover and V * (for two sequential repetitions) will contain 
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f~*(a) with probability 1. That f is one-way implies that there is no PPT simulator for 
two sequential repetitions of this protocol. 

A similar modification to Pass’s 2-round NPRO zero-knowledge protocol for NP 
yields a 2-round NPRO zero-knowledge protocol that is no longer NPRO zero-knowledge 
when composed twice. 


Remark 1. Our counter-example are efficient-prover protocols (looking ahead, our 
sequential composition theorem only holds for efficient-prover protocols). This mean 
that a cheating verifier (which is allowed to be nonuniform) can in fact simulate an 
interaction between the honest prover and the honest verifier. This is different from 
the counter-example in wherein the cheating verifier cannot simulate such an 
interaction. There, the honest prover is allowed nonuniformity, whereas the cheating 
verifier is not, and the counter-example exploits the fact that the honest prover is “more 
powerful” than the class of cheating verifiers in an essential manner. This distinction 
was previously made in [O]. 


6 Sequential Composition with Dependent Auxiliary Input 


Next, we prove a sequential composition lemma for auxiliary-input EPRO zero- 
knowledge and auxiliary-input NPRO zero-knowledge, which confirms that these are 
in some sense indeed the “right” definitions. 


Theorem 5 (sequential composition). Let (P, V) be an efficient-prover protocol in the 
RO model. Let Q(-) be a polynomial, and let (Pg, Vo) be an interactive protocol that 
on common input x € {0,1}", proceeds in Q(n) phases, each of them consisting of 
executing the interactive protocol (P, V) on common input x (with independent coin 
tosses for P). If (P,V) is auxiliary-input EPRO (resp. NPRO) zero-knowledge, then 
(Po, Vo) is also auxiliary-input EPRO (resp. NPRO) zero-knowledge. 


Proof (sketch). We begin with the proof for EPRO zero-knowledge. The high-level 
structure is similar to that in [IA] for establishing a sequential composition lemma for 
zero-knowledge proofs in the standard model. We start by partitioning the cheating 
verifier V5 for (PQ, Vq) into Q(n) phases, each of which is the execution of a verifier 
V* for a stand-alone protocol (P,V). V* takes as input the common input x and an 
auxiliary string encoding the statt] for VŠ at the end of some phase i (the string also 
encodes 7) of an interaction with Pg, and upon interacting with P produces as output 
another string encoding the state for V* at the end of phase ¿ + 1. The zero-knowledge 
property of (P, V) then guarantees a simulator S for V*. 

We generalize the earlier notation for programming a function by recursively 
defining RO[f1,..., 4:41] as (RO[¢1,..., €:])[4:41]. We may now specify the simulator 
for VŠ as follows: on input (x, z), 


10 For simplicity, we may think of the string as encoding the transcript for the first i phases of 
interaction with Pg along with the random tape and auxiliary input for V. 
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— Set m =z 
— Run the simulator S' for Q phases using the simulated oracle generated in 
the previous phase; that is, for i = 1,2,...,Q, 


Srta (gy T1) > (Ti, li) 
- Output (Ly Wea to, ro) E 


We define Q(n) + 1 hybrids, Ho,..., Hq. The j’th hybrid is defined as the output of 
the following experiment: 


- Run Z®°(1") > z, 7 = 2. 
— Let the honest prover interact with the cheating verifier for 7 phases; that 
is, for? = 1,2,...,7, 


(PP, VO (m1) (2) > Ti 


— For the remaining Q — j phases, run the simulator S' using the simulated 
oracle generated in the previous phase; that is, for? = j +1,...,Q, 


Potala n) > (ah) 
— Ai; is (RO[E;41, s$ lol TO A 


Note that Ho and Hg correspond to simulated transcript and the actual transcript 
respectively. We need to show that Ho and Hg are computationally indistinguishable. 
Suppose on the contrary that this is not the case. Therefore, we have a nonuniform 
oracle PPT D that distinguishes two consecutive hybrid distributions H; and H;.,. We 
define an auxiliary input machine Z; that computes the interaction between P and V* 
for the first 7 phases: 


- Run Z*°(1") > z, To = z. 
- Fori = 1,2,...,j, (P®°,V*™(n-1))(2) > Ti 
— Output (z, Tj). 


This allows us to rewrite H; and H;,, as follows: 


— Run Z}°(1") > (z, Tj). — Run 25° (1”) > (z, T5). 

— S(x, Tj) > (7541, lj+1) — (PPO, V*®O(T}))(£) > Tj+1 

- fori = j + 2,...,Q, - fori = j +2,...,Q, 
(SOET (2) = (Ti, £i) (SOli (7,_4)) (a2) > (Ti, £i) 

-— Output (RO[é;41,..., lQ], Ta, 2). -— Output (RO[L;j+2,. . - , lQ], Ta, 2). 


1l We abuse U slightly here; we want 1 U. . .U Lo to denote the set satisfying RO[l1 U. . .U Lo] = 
RO[C1,..., Za]. 
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This is sufficient to contradict zero-knowledge of (P, V) for the j + 1’th phase[4 The 
result for NPRO zero-knowledge follows as a special case corresponding to 0; =... = 
bo =l. 


Remark 2. One might expect a priori that the zero knowledge is preserved under a 
polynomial number of repetitions once we take into account oracle-dependent auxiliary 
information. However, we are only able to establish such a statement in the EPRO and 
NPRO models. Technically, the proof breaks down for FPRO zero-knowledge because 
the simulator is not required to specify the simulated random oracle. In particular, this 
shows that sequential composition is more subtle than merely accounting for auxiliary 
input. A natural question that arises is whether prepending a random prefix to all oracle 
queries allows us to transform any protocol that is FPRO zero-knowledge into one 
that remains zero-knowledge under a polynomial number of sequential compositions. 
We note that using a random prefix only guarantees “independence” of the prover’s 
messages across different iterations; a cheating verifier is not limited to queries with the 
given prefix] 


7 Proofs of Knowledge with Dependent Auxiliary Input 


Several constructions of zero-knowledge protocols begin with the verifier sending a 
proof of knowledge, for instance, that used in our counter-example, and the NPRO zero- 
knowledge protocol in [22]. If we allow the cheating verifier to receive an auxiliary 
input that depends on the random oracle, we would need to also extend the definition of 
proof of knowledge to incorporate auxiliary inputs that depend on the random oracle. 


Definition 2. Let (P,V) be an interactive protocol for a language L = Lr. We say 
that (P, V) is a proof of knowledge w.r.t. dependent auxiliary input in the RO model 
(or auxiliary-input proof of knowledge) if for every oracle PPT P*, there exists an 
oracle PPT E such that for all nonuniform oracle PPT Z and for all x: 


Prze) —> z; E™ (x, z) > w; (x,w) € R] 
Ri 
> Pr[Z (12l) — z; (P*(z), V%)(z) accepts] — neg( |) 


12 Unlike in the standard model, we cannot use an averaging argument to fix the output (2.75) 
from Z;. This is because the output depends on RO. We may eliminate the efficient-prover 
constraint in the lemma by allowing the auxiliary input machine Z in the definition of zero- 
knowledge to be unbounded, but we do not know how to achieve zero-knowledge without a 
bound on the query complexity of Z. 

Specifically, consider the trivial protocol for the language {0, 1}* wherein the prover sends 
nothing and the (honest) verifier always accepts. Note that using a random prefix does not 
affect this protocol in any way. Now, consider a cheating verifier that after each iteration 
outputs RO(O”). The zero-knowledge simulator for a single iteration (without dependent 
auxiliary input) may simply output a random string, but simply concatenating the output of 
this simulator for a polynomial number of times does not yield a correct simulation of the view 
of the cheating verifier for a polynomial number of iterations. This highlights the difference 
between simulating the transcript vs the output of the verifier, and the difficulty in ensuring 
“independence” of the random oracles amongst different iterations. 
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The next result implies a separation between zero-knowledge proofs of knowledge and 
zero-knowledge auxiliary-input proofs of knowledge. Specifically, we rule out non- 
interactive protocols that are simultaneously zero-knowledge and a proof of knowledge 
(in the above sense); otherwise, we can simply apply the knowledge extractor to the 
simulated proof to obtain a candidate witness. Some care is needed in arguing that the 
knowledge extractor works on the simulated proof, which uses a simulated random 
oracle and not the actual one. The reason why this approach only works for the new 
definition of proofs of knowledge is that we allow a cheating prover to receive oracle- 
dependent auxiliary input. In particular, the cheating prover may receive a convincing 
proof as auxiliary input, and the knowledge extractor can neither rewind the auxiliary 
input machine nor observe the oracle queries it makes. The proof is deferred to the full 
version. 


Theorem 6. Assuming the existence of one-way functions, there is a 2-round public- 
coin argument system for NP that is auxiliary-input EPRO zero-knowledge, and also an 
auxiliary-input proof of knowledge. On the other hand, only languages in BPP have a 
non-interactive argument system that is EPRO zero-knowledge and an auxiliary-input 
proof of knowledge. 


8 Circuit Obfuscation in the Random Oracle Model 


Let O be a probabilistic polynomial-time algorithm and let C be a family of circuits. Let 
A, S, Z, D be oracle PPTs. Given C € C, we define diff Pees (C) to be the quantity 


Pr[z — Z (11); 7 — AM(O*(C)); D™(7,z)) = 1] 


—Prlz — Zouch), (7,0) — SOO (a); D: (r, z) = 1] 
RO 
Definition 3 (circuit obfuscation INGI). A probabilistic polynomial-time algo- 
rithm O is an obfuscator for the family of circuits C = Un Cn in the FPRO model 
(where Cn is the subset of circuits in C that take inputs of length n) if the following three 
conditions hold: 


— (approximate functionality) There exists a negligible function a such that for all 
n, forall C € Cy, with probability 1 — a(n) over the internal coin tosses of the 
obfuscator and over RO, O¥°(C) describes a circuit with RO-gates that computes 
the same function as C. 

— (polynomial slowdown) There is a polynomial p such that for every circuit C € C, 
|O(C)| < pC). 

— (virtual black-box property) For every oracle PPT A, there exists an oracle PPT 
S such that for all C € C and for all nonuniform oracle PPTs Z and D, 
diffi g z, p(C) is negligible (as a function of |C |). 


aux-input | :  aux-input ; : aux-input 
obfin FPRO : : Obfin EPRO : : obf in NPRO 


Fig. 2. Relations between different variants of obfuscation in the RO model. An arrow is an 
implication, a double-tailed arrow is an equivalence, and a crossed arrow indicate separation. 
We stress that the relations refer to families of circuits that are obfuscatable according to the 
respective notions. 


We say that C is FPRO obfuscatable if there exists an obfuscator for C. In addition, we 
obtain: 


EPRO obfuscatable if O1 = £, O2 = RO, O3 = RO[4] 
NPRO obfuscatable if O1 = £, O2 = RO, O3 = RO 
auxiliary-input FPRO obfuscatable if O1 = RO, O2 = £, O3 = € 
auxiliary-input EPRO obfuscatable if O1 = RO, O2 = RO, O3 = RO[E] 
auxiliary-input NPRO obfuscatable if O} = RO, O2 = RO, O3 = RO 


A point function J, is a boolean function that evaluates to 1 on input w and 0 
everywhere else. As observed in [T9], to obfuscate J, in the RO model, we may simply 
pick a random a € {0, 1}!“! and store a, RO(a@o w) in the obfuscated circuit, which on 
input x, outputs 1 iff RO(ao x) = RO(aow). 


Theorem 7 (obfuscating point functions [[9]). There exists an auxiliary-input FPRO 
obfuscator for the class of point functions. 


One may expect some modification of the previous construction to yield an EPRO 
obfuscator for point functions, but this turns out to be impossible. The next result 
follows from a similar characterization in for NPRO obfuscation: 


Theorem 8 (triviality). A family of circuits C = Un Cn is EPRO obfuscatable iff C is 
efficiently and exactly learnable using membership queries. 


Proof (sketch). Suppose C is efficiently and exactly learnable using membership 
queries. Consider an obfuscator that simply takes the input circuit C and outputs the 
circuit produced by the learning algorithm given oracle access to C; the simulator does 
essentially the same thing. 

The learning algorithm for an EPRO obfuscatable family of circuits C is very simple. 
To evaluate C € C on input z (given oracle access to C and input x), run the simulator 
S for the trivial adversary A that merely outputs the obfuscated circuit to obtain (7, £), 
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answering queries to RO with random coin tosses, and then evaluate 7 on the input 
x using the simulated oracle RO[¢]. This may be modified via standard techniques 
(specifically, we will need to amplify the soundness error via repetition and then take 
a union bound over all inputs) to yield a learning algorithm that on oracle access to C 


output w.h.p. a (standard) circuit that agrees with C on all inputs. 
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Abstract. A universally composable (UC) blind signature functional- 
ity demands users to commit to the message to be blindly signed. It 
is thereby impossible to realize in the plain model. We show that even 
non-committing variants of UC blind signature functionality remain not 
realizable in the plain model. We then characterize adaptively secure UC 
non-committing blind signatures in the common reference string model 
by presenting equivalent stand-alone security notions. We also present a 
generic construction based on conceptually simple Fischlin’s blind signa- 
ture scheme. 


1 Introduction 


BACKGROUND. Since the introduction of blind signatures vast number of 
papers are devoted to efficient constructions, security analysis, and extensions. 
Major applications include untraceable payment systems [9| and anonymous vot- 
ing [03]. The standard notions of security for blind signature schemes in the 
stand-alone setting are blindness and unforgeability DETS]. Universal compos- 
ability (UC) framework [B] offers security in more general setting where other 
arbitrary protocols are running concurrently. It asserts that the properties pro- 
vided by an idealized functionality retain even under general composition. A 
blind signature functionality is first suggested by Canetti in and formally 
defined by Fischlin in [I] with a round-optimal realization in the common ref- 
erence string (CRS) model. Kiayias and Zhou study adaptive security in [9]. 

In known blind signature functionalities, e.g., I9], a user commits to a 
message to request a signature. Then a signature is issued by the functionality 
remotely from the view of the signer. In [LJ], Fischlin pointed out that a UC blind 
signature protocol that realizes such a functionality implies a UC commitment 
protocol in the static corruption model and thus impossible to realize in the plain 
model [7]. A more formal argument is given by Lindell in ZOZI]. A common idea 
for these arguments is that the existence of a simulator implies extraction of the 
input message and hence contradicts to the blindness. 

Is there a hope to circumvent the above impossibility if the functionality is 
relaxed by giving up the commitment property? In some applications such as a 
simple e-cash or a coupon system, every message can be a random string that 
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the users do not need to know or fix in advance. Such applications only concern 
blindness and unforgeability. In B], Buan, Gjgsteen, and Krakmo presented a 
non-committing blind signature functionality where corrupt users no longer de- 
posit messages. Thus there is no need to extract the messages for simulation. It 
was shown that such a non-committing blind signature functionality is realizable 
in the plain model and the presented security is equivalent to the unforgeability 
and weak blindness defined by Juels, Luby and Ostrovsky in [X]. 


OUR CONTRIBUTION. Somewhat contradictory, we show that universally com- 
posable non-committing blind signatures are still impossible in the plain model. 
Our proof shows that if the functionality provides blindness the presence of a 
simulator contradicts to the unforgeability in the plain model. Importantly, the 
positive result in [2] stands only in a restricted corruption model where the signer 
can be corrupted only after the key generation process. As stated in the paper, 
such a restriction is too strong that it is equivalent to incorporating a trusted 
party in the protocol. Our result holds for the most general corruption model. 
It is also pretty robust in the sense that it applies to wide variety of blind signa- 
ture functionalities that formulate blindness in a reasonable way like all existing 
functionalities do. 

Despite the negative result, non-committing blind signatures remain an in- 
teresting cryptographic object to study. The less demanding functionality would 
allow simple protocol designs in advanced models. This paper presents a thor- 
ough characterization of a non-committing blind signature functionality that is 
secure against adaptive adversaries without secure erasures. We prove that the 
properties captured by the functionality is equivalent to a pair of stand-alone 
security notions in the common reference string model, which are the standard 
unforgeability and a new strong notion of blindness which we call equivocal simu- 
lation blindness. We then decompose the equivocal simulation blindness to more 
handy notions called session equivocality and signature equivocality in a specific 
setting. We also show a generic construction. The simplicity of our framework 
can be highlighted when compared to the result on the adaptive security for the 
committing blind signatures [9]. 

Due to lack of space, most proofs are moved to the full version [I], which also 
includes results in the static corruption model. 


2 Notations 


All algorithms in this paper run in polynomial-time in the security parameter A. 
By y — A(zx;r) we mean that algorithm A is invoked with input x and uniformly 
chosen randomness r, and outputs something labeled as y. Randomness r may be 
omitted. By (a,b) — (A(x), B(y)) we denote an execution of interactive Turing 
machines A and B on input x and y and with output a and b, respectively. When 
only one side of the output is of concern, we write a — (A(x), B(y))z for the 
left side and b — (A(x), B(y))r for the right side. We write alw] — A when A 
has some extra output w. The meaning of w depends on the context and will be 
noted whenever this notation is used. For notations and notions related to the 
UC framework we refer to f]. 
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3 Blind Signature Schemes 


3.1 Syntax and Standard Security Notions 


A blind signature scheme BS in the common reference string model consists of 
five algorithms BS = BS.{Crs, Key, User, Signer, Vrf}. BS.Crs is a common refer- 
ence string generator. BS.Key is a key generator. BS.User is an interactive signa- 
ture request algorithm and BS.Signer is a signing algorithm. Interaction between 
BS.User and BS.Signer forms a signature generation protocol. BS.Vrf is a signa- 
ture verification algorithm. A blind signature scheme must provide completeness 
and consistency. Roughly, completeness is that for any (m,o) made faithfully 
through BS.Crs, BS.Key, BS.User, and BS.Signer, verification algorithm BS.Vrf 
outputs 1. Consistency is that BS.Vrf outputs the same value for the same input 
(even for keys generated by an adversary). We refer to [A for details and dis- 
cussions on these properties. Two standard security notions are unforgeability 
and blindness as shown below. 


Definition 1 (Unforgeability: UF ). A blind signature scheme BS is unforge- 
able if Succ, (A) = Pr[Forge#2 (A) = 1] is negligible in À for any algorithm 
F* where Forge®: is the experiment shown below. F* can access to the oracle 
arbitrary number of times concurrently. 


Experiment Forget? (A) : 
X — BS.Crs(1*) 
(uk, sk) — BS.Key( X) 


((m1, 04), (Meyi: nga) — FCPS SEE) O, owk) 


Return 1 iff 
completed — (-, BS.Signer( X, sk))r happens at most k times, and 
mi Æ mj for alli <i< j <k+1, and 
BS. Vr X, vk, mi, o;i) =1 for alll <i<k+1. 


Strong unforgeability (sUF) is defined in the same way but requiring (mi, o;i) Æ 
(mj, oj) instead of m; Æ mj, This paper focuses on the above relatively weaker 
notion as it suffices for major applications. 


Definition 2 (Blindness: BL ). A blind signature scheme BS is blind if 
Adv®..(\) = | Pr[Blind?2 (A, 0) = 1] — Pr[Blind88(A, 1) = 1]| is negligible in À 
for any algorithm B* where Blind? is the experiment shown below. 


Blind% (A, b) : 
X — BS.Crs(1`) 
(vk, mo, mı) — B* (X) 
op — (BS.User{ X, vk, my), B*)L 
01» — (BS.User( X, vk, mi), B*) 1 
Ifog = L oro, = L then set oo = 01 = L. 
Return b — B* (01,00) 
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For ease of notation, we represent algorithm B* as stateful so that it implicitly 
takes over its internal state from the previous invocation every time it is invoked 
by the experiment. Only new inputs are explicitly presented in the description. 
This convention is applied to all algorithms denoted with asterisk (*) throughout 
the paper. 

As observed in [I7], the above definition captures the case where the adversary 
attempts to get useful information by aborting the sessions. [2] extends the 
notion in such a way that, when adversary B* is given (L, L) at the end, it is 
given an extra piece of information that tells which session (the first or second 
or both) actually yields L in the user side. The results in this paper also holds 
with respect to the stronger notion of blindness. 


4 UC Non-committing Blind Signatures 


4.1 Functionality Fncb 


Figurellillustrates our non-committing blind signature functionality Fancb. In the 
figure, v is a deterministic signature verification algorithm. J is a description of a 
stateless signing algorithm. See [6] for remarks on running arbitrary algorithms in 
a functionality. As well as the ordinary signature functionality in [5] we formulate 
Fac» not to provide any security properties if an unregistered verification key 
is given as input to the signature generation and verification phases. See the 
discussion about the key management below. 

The idea of using counters to enforce the unforgeability is the same as that in 
[2]. Due to the difference of the timing that the counters are increased, our for- 
mulation can live with the general communication model thoroughly controlled 
by the adversary while the one in {| needs authenticated communication in its 
realization. Note that the bare signature functionality in [5] can be realized with- 
out authenticated channel because there is no link between the public-key and 
the identity of the signer and it is not a matter who issues a signature as long 
as the signature is valid. 


NON-COMMITTING PROPERTY. Observe that input message m from a corrupt 
user is sent nowhere nor stored in the functionality. Thus S working on behalf 
of a corrupt user can complete the signature generation process whatever m is. 
This formulation results in avoiding the need of extracting the message from the 
corrupt users. 


UNFORGEABILITY. This property holds only while signer P; is honest. Counter 
Cemp| counts the number of completed signature generations in the signer’s side 
while counter Cyaliq counts the number of valid signatures on distinct messages 
received by honest users with legitimate verification. The verification process ac- 
cepts signatures on new messages only if Comp! > Cvalid. From the specification, 
it is clear that Comp! > C\alia always holds as long as the signer is honest. Thus 
unforgeability is guaranteed in the absolute sense. To capture weak unforgeabil- 
ity, Cyalia is incremented only for unique messages in the signature generation 
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Key Generation : Given (KeyGen, sid) from a party Ps, verify that sid = 
(P;, sid’) for some sid’. If not, then ignore. Else, forward (KeyGen, sid) 
to simulator S. Then, on receiving (Generated, sid, v, II) from S, send 
(Generated, sid, v) to P, and record (Ps, v, II). Let Comp) = Cvaia = 0, and 
I’ be empty. This phase must be completed only once and before other phases. 

Signature Generation : On receiving (Request, sid, ssid, v’, m) for some m from 
P,,, send (Request, sid, ssid, v’) to S and do the following. 

i. On receiving (Signed, sid, ssid) from S, forward it to Ps. Set Camp) — 
Cempl + T, 
ii. On receiving (Received, sid, ssid) from S, do as follows: 
— If P, is honest and v’ = v, then do as follows. If (m,*,1) ¢ I, set 
Cyalid — Caia + 1. Compute o — (m) and record (m,o,1) to I. 
If (m,o,0) € I’, send an error message to signer P, and halt. Send 
(Received, sid, ssid, o) to Py. 
— Else if P, is corrupt or v’ Æ v, ask S and forward P,, whatever received 
from S. 
Signature Verification : On receiving (Verify, sid, ssid, v',m,o) from some 
party P,, set p = v'(m, c) and do as follows. 
1. If v’ Av, set f = ọ. 
2. Else if (m, o, f’) € T for any f’, then set f = f’. 
3. Else if P, is corrupt or (m, *,1) € I’, then set f = y and record (m,o, f) 
to I’. 
4. Otherwise: 
(a) It Compl > C\valid, then set Í = 09 and Chalid {É Chalid + f: 
(b) Otherwise, set f = 0. 
Then record (m, o, f) to I. 
Output (Verified, sid, ssid, f) to Py. 

Player Corruption : On receiving corruption to P,, send all inputs and out- 
puts exchanged with P„ to simulator S. Also send all randomness used in the 
evaluations of JI with respect to Py. 


Fig. 1. Non-committing blind signature functionality Fancb 


process (see step (ii)). Strong unforgeability can be captured by removing con- 
ditions “if (m,*,1) ¢ I” and “or (m,*,1) € I” from the signature generation 
and verification phases respectively. 


COMPLETENESS AND CONSISTENCY. If the signer and a user are not corrupted 
and the registered key is given as input to the signature generation phase, 
(m, 0,1) is recorded. The verification phase for such faithfully generated (m, o) 
and registered v finds that record and always outputs f = 1. Thus complete- 
ness is captured. Consistency holds for free since algorithm v is deterministic. 
Limiting v to be deterministic loses generality but makes the exposition con- 
siderably simpler. For issues with respect to probabilistic verification algorithms 


see bLA. 


BLINDNESS. Important observations are; 1. IM is fixed before any sub-session 
for signature generation starts, 2. I takes nothing but message m as input, 
and 3. Message m and II(m) are never sent to S or P, during the signature 
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generation phase. This formulation thereby assures that remotely computed ø is 
independent of the signature generation viewed by the signer. Such a mechanism, 
which we call remote signing, is suggested in H] and employed by all known blind 
signature functionalities. 


ON KEY MANAGEMENT. The “bare” signature functionality in is formu- 
lated in such a way that it stores a single public-key in every session and the 
security properties are guaranteed only for the registered public-key. The func- 
tionality enjoys concise presentation and high modularity. We take over his 
approach to define Fancb. Namely, if unregistered v’ is given as input to the 
signature generation or verification phase, Fancb behaves just as S intended. So 
even though a user is honest, no security is guaranteed in such a case. (Re- 
call that the environment can pass arbitrary v’ to an honest user.) Accordingly, 
upper-level protocols that uses Fneb must be responsible to provide registered v 
to the honest users. 

An alternative formulation would be to let Fac» to explicitly reject unregis- 
tered v’. It however results in incorporating a mechanism for distributing the 
correct public-key within the blind signature protocol. For instance, the proto- 
col realizing Fncb may be constructed in Fea-hybrid model where Fea is the 
certificate-authority functionality that serves only for the blind signature 
protocol. Though this kind of issue can be handled by the theorem of universal 
composition with joint state [B], we prefer Fac» to be basic for the sake of higher 
modularity. 

In the literates, implicitly follow the same approach as ours. They however 
define their functionality only for the case of receiving the registered public-key 
as input to the signature generation phase. It results in simpler presentation but 
eventually the details need to be provided with care. shows more extended 
functionality such that it handles several public-keys under the same session-id 
and guarantees blindness for every set of signatures issued with the same public- 
key. This approach however suffers high complexity in its presentation. 


VARIATIONS. Fac» in Fig. [I] notifies only the end of the signature generation 
process to the environment. It can be extended so that the environment can 
give the signer explicit approval or denial for starting the process by adding 
another round of interaction among S, Fancb, and P,. It is also possible to let 
the environment know about the abnormal termination of the protocol in the 
same way. These modifications do not affect to the results in this paper since 
they can be incorporated only by modifying the protocol wrapper in Section E2 
accordingly. 


4.2 Impossibility in the Plain Model 


This section shows that Fancb cannot be realized without accessing to extra ideal 
functionalities or assuming some help from incorruptible parties. To make the 
statement meaningful, we consider non-trivial protocols where honest parties 
running the protocol with right inputs terminate and output something with 
noticeable probability. 
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Theorem 1. There exists no non-trivial protocol that securely realizes Fncp in 
the plain model. 


Proof. We use S to extract the remote signing function J and use it to break 
the unforgeability in the real protocol. Recall that a forgery could never happen 
in the ideal model. Thus Z can distinguish the ideal process and a real protocol 
execution by observing a successful forgery. 

Suppose that there exists a non-trivial protocol m that realizes Fy, in the 
plain model. Recall that Fac» is invoked when it receives (KeyGen, sid) from a 
signer. It then outputs (Generated, sid, v) to the signer. Protocol 7 works in the 
same way since it realizes Fancb. Let mko denote such a part of m that receives 
(KeyGen, sid) as input and outputs (Generated, sid, v). 

Consider a particular A* and Z* that behave in EXEC, 4+,z+ as follows. Z* 
first asks A* to corrupt the signer. Z* then runs mgo with input (KeyGen, sid) 
and obtains (Generated, sid, v). (Here, without loss of generality, we assume 
that mg can be run solely by the signer up to the moment (Generated, sid, v) 
is output. See the discussion after the proof for generalization.) Z* then sends 
(KeyGen, sid) and v to A* and receives (Generated, sid, v) from A* working on 
behalf of the corrupt signer. Z* then asks a signature on a message m by sending 
(Request, sid, ssid, v, m) to an honest user. If A* is to join m on behalf of the 
signer to generate a signature, Z* takes over the role and completes the protocol 
by faithfully following r. The user eventually outputs (Received, sid, ssid, o). Fi- 
nally Z* sends (Verify, sid, ssid, v,m,a) to a user and receives (Verified, sid, 
ssid, f) as a result of verification. Observe that, even though the signer is cor- 
rupted, Z* simulates an honest signer by following m. Furthermore, due to the 
completeness and terminating property of 7, Z* can complete signature gener- 
ation with noticeable probability. If Z* completes, f = 1 appears at the end. 
Since 7 realizes Fancb, there exists a simulator S* for such A* and Z*. To suc- 
cessfully simulate A*, simulator S* has to send IT to Fycp before Z* sends 
(Request, sid, ssid, v, m) to an honest user. Furthermore, with noticeable prob- 
ability, M (m) must yield a valid signature accepted by protocol r. 

Now we construct Z that distinguishes EXEC, ,4,z and IDEAL;,.,,s,z by 
using above S* as a subroutine. Z first sends (KeyGen, sid) to the honest signer 
and receives (Generated, sid,v). Then Z starts simulating Z*. It asks S* to 
corrupt the simulated signer. Then it sends (KeyGen, sid) and v to S* and receives 
(Generated, sid, v, IT) from S* on behalf of Facb. Now Z computes o — IT(m) 
for some m. It then sends (Verify, sid, ssid,v,m,a) to a verifier and receives 
(Verified, sid, ssid, f). The output of Z is f. 

Let us evaluate Z. Suppose that Z is in EXEC, 4,2. Z simulates Z* per- 
fectly for S*. In particular v in this case is generated honestly by m just as 
Z* does. So S* outputs (Generated, sid, v, IT) as expected. Then with notice- 
able probability such IT yields ø that passes the verification protocol of m. Thus 
f = 1 happens with noticeable probability in this case. Next suppose that Z is 
in IDEAL,,,,,s,g- In this case, v is generated by S. If it is distinguishable from 
the one observed in EXEC, A,z, Z distinguishes EXEC, 4,z and IDEALf,,,, sz 
on that basis. If it is indistinguishable, S* outputs (Generated, sid, v, IT) as well 
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as in the previous case. Since no signature generation process is completed in 
IDEALgs,,,,s,z and Fych provides absolute unforgeability, Fych rejects o gener- 
ated by IZ. Thus f = 0 for this case. Accordingly Z distinguishes EXEC, 4,2 
and IDEALg,.,,.s,z with noticeable probability. | 


An essential point is that Fancb demands S to extract I even from a corrupt 
signer for the sake of blindness. But the successful extraction of JI contradicts 
to the unforgeability. The situation is very similar to the case of UC commit- 
ments [Z] where the message from a corrupt committer must be extracted for the 
sake of binding property, and the successful extraction contradicts to the hiding 
property. 

The proof does not go through if protocol m involves incorruptible trusted 
parties or any extra ideal functionalities. The point is that Z* should be able to 
run 7«g by itself so that the distribution of v is solely under the control. This 
allows Z to simulate Z* simply by sending v generated outside of Z. If mke 
involves parties other than the signer, Z* corrupts them before they send off 
any message and simulate them honestly by following mke. When Z simulates 
Z*, these corrupted parties are simulated by following the behavior of the real 
uncorrupted players Z is working with. 


5 Characterization 


5.1 Blindness Based on Simulatability 


The following new notion called simulation blindness assures that the signature 
generation protocol can be executed without knowing the message. Similarly, 
the resulting signature can be generated without involving any information from 
the protocol run. To capture adaptive security, we require state reconstruction 
property. We use the term equivocal when a notion involves state reconstruction 
property. 


Definition 3 (Equivocal Simulation Blindness: EqSimBLND ). A blind 
signature scheme BS is equivocal simulation blind if there exists a set of al- 
gorithms SIM = SIM.{ Crs, User, Sig, State} such that SIM.User and SIM.State 
can be stateful and SIM.Sig must be stateless, and advantage Adv#?()) = 
| Pr[EqSimBL?2 (A,0) = 1] — Pr[EqSimBL®.(A,1) = 1]| is negligible in A 
for any D*, where EqSimBL®> (A, b) is the following experiment. Oracles are 
accessible in arbitrary manner. 


EqSimBLŻ$ (A, 1) : 


ae BS.Crs(1*) O1(X, vk, m) 
vk — D*(5) o — (BS.User( X, vk, m; r), D*) 1, 
5 p02 vk) Output (o,r) 


Return b 
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EqSimBL?, (A, 0) : Oo(2, vk, m, t) 
(X,t) — SIM.Crs(1*) db [wy] — (SIM. User( £, vk, t), D*) 1. 
vk — D*(X) a[ws] — SIM. Sig( X, vk, m, t) 
b — prOo(%vk,-t) r — SIM. State(w,, ws) 
Return b If 6 =0, then seto = L. 


Output (o,r) 


Denoted by wy, and ws are the state information of SIM.User and SIM.Sig, 
respectively. 


Note that SIM.State is supposed to simulate the randomness even for the case 
where the interaction between SIM.User and D* is terminated abnormally. 
SIM.State can see how the interaction is terminated by seeing the state informa- 
tion Wy. 

It would be more useful if we could present separate notions of simulatability 
for simulating the view of sessions by SIM.User and the signatures by SIM.Sig. 
We call the notions session equivocality and signature equivocality. It is however 
not a proper way in general. Since SIM.User and SIM.Sig uses the same trapdoor 
as input and they may give negative influence each other when they are used at 
the same time. We thus consider a special case where trapdoors are separated 
like (t4, t2), and SIM.User (and SIM.Sig) can be run only with tı (and tg, respec- 
tively). With respect to the separate trapdoor generator we define two notions 
of simulatability. 


Definition 4 (Separable Trapdoor Generator). S/M.Crs is a separable trap- 
door generator if it outputs (X, (tı,t2)) such that X is indistinguishable from 
those generated by BS.Crs with negligible advantage, say Adve:, for any 
algorithm C*. 


Definition 5 (Signature Equivocality: SigEq). A blind signature scheme BS 
is signature equivocal if there exists algorithms SIM.Sig and SIM.SigState such 
that advantage function Adve (A) = |Pr[SigEQS (A, 0) = 1] — Pr[SigEQ 5S 
(A, 1) = 1]| is negligible in security parameter A for any A*, where SigEQ® (A, b) 
is the following experiment. 


SigEQS (A, b) : O(X,vk,m, tı) 
(27, (£1, t2)) — SIM.Crs(1*) o — (BS.User( X, vk, m; rıl|r2), A*) 5 
vk — A* (X) Output (o, rı||ra) 
be Ax Oo(& vk, t1) 
Return b Oo(X, vk, m, tı) 


o([0] — (BS.User( X, vk, m;11||r2), A*) 7 
a’ [ws] — SIM. Sig(X’, vk, m, tı) 

ri — SIM. SigState(0, ws) 

Ifo=1, thing = L; r,=11. 
Output (o',rilir2) 


Symbol 6 is the transcript observed by BS.User, and ws is a state information of 
SIM. Sig. 
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Definition 6 (Session Equivocality: SesEq ). A blind signature scheme BS 
is session equivocal if there exists algorithms SIM.User and SIM.SesState such 
that advantage function Advi. (A) = | Pr[SesEQ#2 (A, 0) = 1] — Pr[SesEQ?2 
(A, 1) = 1]| is negligible in À for any algorithm E*, where experiment SesEQ?° 
is the following. 


SesEQ?? (A, b) : O1(, vk, m, tə): 
(X, (t1, t2)) — SIM.Crs(1*) (BS. User( X, vk, m;11||r2), E*) 
vk — E*(X, tı) Return ro 
he Bx Ook, ta) 


Oo( X, vk, m, ta): 
[wu] — (SIM. User( 2’, vk, t2), E*) 3 
r2 — SIM. SesState(wu, m) 
Return rə 


Return b 


Oracle O, receives a message m from E* and interacts with E*. Symbol wu is 
the state information of SIM. User. 


In Definition Jit is assumed that randomness r used in BS.User can be separated 
into two parts rı and r2. An intuition is that rs is used while interacting with the 
signer and rı is used after receiving the final message from the signer for computing 
the output signature. This treatment does not lose generality as one can set either 
part as empty. Regarding Definition[Jwe stress that the messages and the resulting 
signatures are not given to Æ*. Also note that trapdoor t; is given to E*. 

We now show relations between the standard blindness and simulation blind- 
ness. Since simulation blindness captures blindness in a very strong way, it seems 
natural that the following lemma holds. Proofs for the following lemmas are in [J]. 


Lemma 1 (EqSimBLND = BL ). Jf BS is equivocal simulation blind then it is 
blind. 


Proof is done in a standard way. We construct D* that successfully breaks equiv- 
ocal simulation blindness by using B* that breaks blindness. 

Regarding the reverse direction, we do not know if blindness solely implies 
simulation blindness or not. We however can show that there exists a scheme 
that is blind and unforgeable but not simulation blind. Namely, for the schemes 
that provide both blindness and unforgeability the simulation blindness is a 
strictly stronger notion than blindness. This implication is limited but sufficiently 
meaningful since we are interested in schemes that provide both blindness and 
unforgeability. Proof can be done in the similar way as that of Theorem [I 


Lemma 2 (BL A UF æ EqSimBLND ). There exists BS that is blind and 
unforgeable but not equivocal simulation blind. 
The following lemma states that it suffices to consider simulatability about ses- 


sions and signatures individually when trapdoors are separable for each purpose. 


Lemma 3 (SesEq A SigEq = EqSimBLND ). Jf BS has a separable trapdoor 
generator and is signature equivocal and session equivocal with respect to the 
generator then BS is equivocal simulation blind. 
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Proof is done through three steps of game transformations starting from 
EqSimBL®.(, 1) to EqSimBL®, (A, 0). 


5.2 Protocol Wrapper Wrap() 


In Fig. 2] we show how to transform a blind signature scheme BS into a blind sig- 
nature protocol by applying a simple wrapper algorithm, Wrap(). The resulting 
protocol Wrap(BS) is in the Fers-hybrid model where Fers is the CRS generation 
and distribution functionality whose output distribution is defined by BS. 


Blind Signature Protocol Wrap(BS) in Fers-model 


Key Generation: Upon receiving (KeyGen, sid) from the environment Z, a 
party P, verifies that sid = (P,, sid’) for some sid’. If not, do nothing. Else, 
P, derives CRS X from Fers, computes (vk, sk) — BS.Key(2’) and outputs 
(Generated, sid, v) where v(m, o) = BS.Vrf( X, vk, o, m). 


Blind Signature Generation: Party P, and P, do the following. 

P,-side: On receiving (Request, sid, ssid, v’, m) from Z, derive X from Fers, 
send (Request, sid, ssid, v’) to Ps, invoke BS.User(X’, vk’,m), and inter- 
act with P.. Take vk’ out from v’. If BS.User outputs o such that 
BS.Vrf( X, vk’, o,m) = 1, then output (Received, sid, ssid, o). 


P.-side: On receiving (Request, sid, ssid, v’) from a user P,, get X from 
Fers, invoke BS.Signer(2’, sk) and interacts with P,. If BS.Signer outputs 
completed, then output (Signed, sid, ssid). 


Signature Verification: On receiving (Verify, sid, ssid, v’, m, o) from Z, a party 
P, derives X from Fers, takes vk’ from v’, computes f — BS.Vrf(2', vk’, o, m), 
and outputs (Verified, sid, ssid, f). 


Common Reference Functionality Fers 


CRS Generation: On receiving (CrsGen, sid), Fers computes X — BS.Crs(1*) for 
the first time and returns X. Simply return the same X for further requests. 


Fig. 2. UC blind signature protocol transformed from stand-alone scheme BS 


Note that the resulting protocol does not implement any mechanism to verify 
the given verification algorithm v’. It works as intended if v’ = v but no security 
is guaranteed for the user if v’ Æ v. Also note that the signer ignores v’ given 
from the user and uses the genuine secret key sk. 


5.3 Equivalence 


Theorem 2 (UF A EqSimBLND © Fac» ). Protocol Wrap(BS) securely realizes 
Fach with respect to adaptive adversaries if and only if BS is unforgeable and 
equivocal simulation blind. 


“If” direction is proven by constructing a simulator, S, that uses A as a black- 
box. To run A properly, S simulates entities and their communication in 
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EXEC} z- We then apply the game transformation technique starting from 
IDEAL s,.,,8,z(A, @) as Game 0. Game 1 removes the use of simulation algo- 
rithms SIM.Crs, SIM.User, SIM.Sig, and SIM.State from Fancb and S. The differ- 
ence is negligible due to the simulation blindness. Game 2 then modifies the 
verification process of Facp so that it no longer care for the counters. This modi- 
fication is justified by the unforgeability. Game 3 further modifies the verification 
process so that it completely follows the verification function. Justification is due 
to the completeness and consistency. Game 4 then modifies Fy-p so that it does 
not record the signed messages any more. It is justified by the completeness and 
consistency again. Finally, Game 5 removes unused actions in Fnceb and S. This 
is just cosmetic to make sure that Fancb and S do nothing but executing the real 
protocol. Thus Game 5 is equivalent to EXEC? °f (A, a). 

“Only if” direction is more intricate. First, assuming that BS is not simulation 
blind, we show that, for any S, there exists successful Z. Second, assuming 
that BS is simulation blind but forgeable, we construct successful Z that is not 
fooled by any S. For the first part, we construct simulation algorithms SIM.Crs, 
SIM.User, SIM.Sig and SIM.State by using S as a subroutine. For such simulation 
algorithms there exists adversary D* that breaks simulation blindness since we 
assumed that BS is not simulation blind. Then we use such D* to construct 
Z. A tricky issue in constructing these simulation algorithms is that they do 
not share the internal state. Since individual copy of S is run independently in 
these functions, it would output different CRS-es and public-keys. Our idea is 
to use the trapdoor as a container of the randomness given to S so that every 
simulation algorithm can give the same randomness to S. In this way, every copy 
of S works on the same CRS and public-key so that all simulation algorithms 
work consistently. A formal proof is given in [J]. 


6 A Generic Construction 


6.1 Overview 


Our starting point is the “basic” blind signature scheme by Fischlin [I]. In 
his scheme, a user commits to message m by sending a commitment c and the 
signer returns a bare signature s on c. Then the user computes a final signature 
o which actually is a non-interactive zero-knowledge proof of knowledge about 
the message m and the valid signature s. Unforgeability is based on the binding 
property of the commitment and the unforgeability of the bare signature scheme 
and the knowledge soundness of NIZK. Blindness is from the hiding property 
of the commitment scheme and the zero-knowledge property of NIZK. By BSg¢ 
we denote this generic scheme. When transformed by our wrapper, Wrap(BSc¢) 
securely realizes non-committing blind signature functionality Fy, with respect 
to static adversaries. (See [I] for details.) It is a surprise that such a conceptually 
simple scheme can provide universal composability even though the adversary is 
limited to be static. 

An essential issue to handle adaptive security is the state reconstruction. 
Looking at the structure of BSG, the session equivocality can be easily achieved 
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by replacing the commitment scheme with a trapdoor commitment scheme. (In 
fact, with such a small modification to BSg, the resulting Wrap(BSg) provides 
adaptive UC security in the erasure model.) On the other hand, the signature 
equivocability is not generally possible there. Recall that a signature is simulated 
by the zero-knowledge simulator. It therefore can be the case that there exists 
no randomness that is consistent to a real witness. To overcome this problem, we 
consider eliminating the use of zero-knowledge simulator by providing a correct 
witness to the proof system through the simulation of the bare signature in the 
signer-side. Namely, we make the signer’s signing algorithm to be simulatable 
by using a signature scheme in the CRS model so that valid signatures can 
be created with the trapdoor of the CRS. In this way, we can always provide 
a witness to the proof system used in the user-side algorithm. Now, witness 
indistinguishability of the proof system assures that the same proof could have 
been created from any other witnesses. Accordingly, a consistent randomness 
always exists. This particular structure is suggested in [I7] for the purpose of 
removing the CRS in the stand-alone model. We will take advantage of the 
structure for achieving adaptive security. 


6.2 Building Blocks 


—NIWI (Non-interactive Witness Indistinguishable Proof System). It is a non- 
interactive witness indistinguishable proof system of knowledge when the CRS 
is generated in the regular way. By NIWI.Crs, NIWI.Prf and NIWI.Vrf, we denote 
the CRS generation function, the proof generation function and the verification 
function, respectively. Additionally it must allow state reconstruction when the 
CRS is simulated. Namely, one can reconstruct a consistent randomness for a 
given witness and a valid transcript. The Groth-Sahai proof system [I6], the 
GS proof system for short, meets these requirements under SXDH or DLIN 
assumption. It unfortunately does not work for any NP statement but works 
efficiently for relations represented by bilinear products. We thus need to choose 
other building blocks so that they fit to the GS proof system for instantiation. 


—TC (Trapdoor Commitment Scheme). It is a standard trapdoor commitment 
scheme. By TC.Key, TC.Com and TC.Vrf, we denote the key generation function, 
the commitment function, and the verification function. There are two more 
functions such that one generates a random commitment and the other opens the 
commitment to an arbitrary value by using the trapdoor generated by TC.Key. 
See [I] an instantiation that works well with GS proof system under the SXDH 
assumption. 


—SSIG (Simulatable Signature Scheme). It is a signature scheme in the CRS 
model with a special property such that valid signatures can be computed from 
the public-key and the trapdoor bind the CRS. By SSIG.Crs and SSIG.Key, we 
denote the CRS generation function and the key generation function. SSIG.Key 
takes the CRS and outputs a signing key and a verification key. Besides the 
signature generation function SSIG.Sign, there is a signature simulation func- 
tion SSIG.Sim that generates valid signatures by using the public-key and the 
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trapdoor generated by SSIG.Crs. It is stressed that the simulated signatures must 
pass the verification by the verification function SSIG.Vrf but it is not demanded 
that they are indistinguishable from the real ones. Similarly, unforgeability is 
the standard unforgeability against chosen message attacks. In particular, the 
adversary is not given simulated signatures. 

Any standard signature scheme can be turned into a simulatable one in an 
unconditional way as follows. Generate two key pairs by running the key gener- 
ation algorithm twice independently. The first key pair is used as the CRS and 
the trapdoor while the second pair is used as the verification and signing key. 
Normal signing is done by using the second key. Simulation is done by the first 
key. A signature is accepted if it passes the original verification predicate with 
respect to either of the keys. 

To fit to the other building blocks, SSIG must be able to sign group elements 
and the verification predicate must be represented as a product of pairings. For 
such a signature scheme a feasibility result based on DLIN assumption can be 
seen in [5]. 


6.3 The Scheme 


The CRS generation function BS.Crs computes (Xwi, twi) < NIWLCrs(14), 
(Src, tte) — TC.Key(1*), and (Essig; tssig) — SSIG.Crs(1ò), and outputs X = 
(Ewi, “bc, Xssig). Key generation function BS.Key is the same as SIG.Key, which 
outputs vk and sk. The signature generation protocol is illustrated in Fig. B] The 
proof system NIWI proves the following relation between witness w = (s,c¢, z) 
and instance x = (vk, Lic, Xssigy M): 


TC.Vrf( Lite, c, M, z) = 1A SSIG.Vrf(Xigcig, vk, c, s) = 1 


Verification function BS.Vrf takes ((Lwi, Xtc, Ussig), uk, 7,m) as input and out- 
puts y € {0,1} such that y — NIWLVrf( Lui, (vk, Nic, Xssig Mm), 7). 


BS.Signer( X, sk) (Dwi, Xtc, XVssig) BS.User( X, vk, m) 


£ (c, z) — TC.Com(tc,m) 


s — SSIG.Sign(Xssig, sk, c) 


Output completed. If SSIG.Vrf(Xssig, vk, 8,c) Æ 1 output L. 
o — NIWI.Prf(2wi, 2, w) where 
x = (vk, Ne, M'ssig, Mm) and 
w = (s,¢, 2). 
Output oc. 


Fig. 3. Generic blind signature scheme BSs. The signature generation protocol. 
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Theorem 3. Protocol Wrap(BSs) securely realizes Fac» in the Feys-hybrid model 
with respect to adaptive adversaries without erasures. 


We claim that the scheme is session equivocal and signature equivocal. Observe 
that setting tı = (twi,tssig) and to = (ttc) forms separated trapdoors. Session 
equivocality is proven by constructing SIM.User and SIM.SesState by using the 
trapdoor property of TC. Signature equivocality can be shown by constructing 
SIM.Sig and SIM.SigState by using the simulation property of SSIG and state 
reconstractability of NIWI. Thus from Lemma [3] we can say that the scheme 
is equivocal simulation blind. We then argue that the scheme is unforgeable 
due to the binding property of TC, the unforgeability of SSIG and the proof of 
knowledge property of NIWI. Finally Theorem Blis applied to complete the proof 
of Theorem B] 
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Abstract. Following the cryptanalyses of the encryption scheme HFE 
and of the signature scheme SFLASH, no serious alternative multivariate 
cryptosystems remained, except maybe the signature schemes UOV and 
HFE `. Recently, two proposals have been made to build highly efficient 
multivariate cryptosystems around a quadratic internal transformation: 
the first one is a signature scheme called square-vinegar and the second 
one is an encryption scheme called square introduced at CT-RSA 2009. 
In this paper, we present a total break of both the square-vinegar 
signature scheme and the square encryption scheme. For the practical 
parameters proposed by the authors of these cryptosystems, the com- 
plexity of our attacks is about 2*° operations. All the steps of the attack 
have been implemented in the Magma computer algebra system and al- 
lowed to experimentally assess the results presented in this paper. 


1 Introduction 


There are mainly two motivations behind the construction of multivariate cryp- 
tosystems. The original one is to provide alternatives to the asymmetric schemes 
RSA and those based on Discrete Logarithm problems which are connected to 
number theoretic problems. Multivariate cryptosystems are instead connected 
to the hardness of solving randomly chosen systems of multivariate equations 
over a finite field, a problem which is NP-complete even in the case of quadratic 
polynomials defined over GF(2) when there are at least two such polynomials 
in the system. Moreover, this problem seems to be hard not only for very spe- 
cial instances but also on the average. Another incentive to develop multivariate 
cryptosystems is the expected efficiency that they might offer, a property that 
would be highly appreciated for constrained environments such as RFIDs and 
other embedded devices. Finally, some people argue about the fact that, contrary 
to the problem of factorisation and that of solving discrete logarithms 23], no 
quantum algorithm is known for the problem of solving sets of randomly chosen 
multivariate equations. 

After the introduction of the C* cryptosystem by Matsumoto and Imai in 
I6], there have been several other proposals. Among the most famous ones 
are certainly HFE (Hidden Field Equations) and SFLASH which can be thought 
of as two ways of generalising the C* scheme. Some heuristic design principles 
have followed. A major one, which has been originally suggested by Shamir 
in PT], is to remove some equations from the public mapping in the case of sig- 
nature schemes; this principle has proven to be successful in thwarting Patarin’s 
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attack [I7] against C* (an attack that can be viewed as a preliminary to Gröbner 
basis attacks). Another one consists in adding a new set of variables to perturb 
the analysis as in the UOV (Unbalanced Oil and Vinegar) signature scheme [IJ]. 

Two of the most promising proposals, SFLASH and HFE have been cryptanal- 
ysed during the last years. Some HFE instances have been shown to succumb to 
Groébner basis attacks in [/] and the complexity of such attack has been argued 
to be quasi-polynomial in [2]. SFLASH has been entirely broken: the missing 
equations (due to the minus transformation) can be recovered in most cases as 
explained in [6] and the secret key of the resulting C* scheme can be recovered 
following the cryptanalysis described in [IQ]. In this context, two new proposals 
were based on internal transformations that are not only quadratic on the base 
field, but also on the extension field: a signature scheme called square-vinegar 
was proposed in [2] and an encryption scheme called square appears in W. 


Our Results. In this paper, we expose a total break of both the square-vinegar 
signature and the square encryption proposals from a theoretical point of view 
as well as from a practical point of view. We indeed describe how to recover an 
equivalent secret key for both cryptosystems given the public key alone. For the 
parameters recommended by the authors, our attacks complete in a few minutes 
on a standard PC. These cryptanalyses also represent a theoretical break of the 
schemes as, under some reasonable assumptions, their complexity is shown to 
be polynomial with respect to the security parameter: the attacks have a time 
complexity of O(log*(q)n®) since they rely on standard linear algebra on n? 
unknowns over a finite field of size q and n is typically small because the time 
complexity of the public computation (signature or encryption) is O(n). The 
attacks are sequences of steps including the discovery of new algebraic invariants 
leaking from the public key, a careful analysis of these invariants to sort out vine- 
gar unknowns from the standard ones. We additionally implemented Magma [B] 
programs that were used to verify each of the steps of the cryptanalyses and to 
perform the attacks against the different sets of parameters recommended by the 
designers of the square encryption and square-vinegar signature schemes. Their 
source code is given in the appendix. 


2 The Square Cryptosystems 


The square cryptosystems are based on design ideas taken from both the HFE 
cryptosystem and the UOV cryptosystem. However, an important property of 
the square cryptosystems is that they are defined over fields of odd characteristic: 
as their internal transformations are quadratic, the systems would be linear over 
fields of characteristic 2. We begin by a brief reminder on HFE and UOV before 
proceeding to the description of the square cryptosystems themselves. 


2.1 The HFE Cryptosystem 


The HFE cryptosystem has been proposed by Patarin in as a possible gen- 
eralisation (and strengthening) of the C* scheme proposed by Matsumoto and 
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Imai in [I6]. Indeed C* was broken by Patarin [I7], whereas the best attack 
against HFE are Gröbner basis attacks which complexity was argued to be 
quasi-polynomial M2]. HFE is called hidden field equation because its internal 
transformation is kept secret. This internal transformation F is defined over an 
extension E of degree n over some base field F, and is chosen to be F-quadratic: 


tad k 
F: Xb y aij XI ie ò BOR XI FY ; (1) 
O0<i<j<n O0<k<n 
q’+qI<D qk<D 


where the coefficients a; j, 3%, and y lie in E and D is an upper bound to the 
overall degree to make it practical to invert F through factorization. Since F is a 
F,-quadratic mapping, it can also be expressed over F, as an n-tuple (f1,..., fn) 
of quadratic polynomial mappings in n unknowns and so can the composition 
To F'o 8 for any pair of one-to-one affine mappings S : Fj > E and T : E— Fj. 
In the case of HFE, the mappings S and T are kept secret and together with F, 
constitute the secret key, whereas the public key is the mapping G= To Fo S. 
In order to decrypt, the legitimate user applies the inverse of T, finds roots of 
the univariate polynomials on the extension field E and applies the inverse of S 
to each of these roots. The plaintext is one of the roots which can be singled 
out by using some redundancy. In this decryption process, the knowledge of the 
secrets S and T is crucial. 

Additionally, Shamir’s proposal to remove some (say r) of the n polynomials 
that constitutes the public key can be applied in the case of a signature scheme: 
indeed, to sign a message (y1,.--,Yn—r), the signer first completes the message 
with random values Yn-r+1, ---; Yn and “decrypts” it normally. This operation 
is called the minus transformation and is used in the square-vinegar scheme. 

With these notations, C* is similar to HFE (with an unbounded total degree) 
where all coefficients of the internal transformation are set to zero but ao,9 for 
a well chosen 0. SFLASH in turn [[, is the original C* scheme with the minus 
transformation applied. 


2.2 The UOV Signature Scheme 


Another ingredient in the design of the square-vinegar signature scheme is the 
use of additional unknowns meant to harden the analysis of the scheme by trying 
to break the structure used during the decryption process. Such an idea was first 
proposed in the oil and vinegar signature scheme. This scheme uses two sets of 
unknowns (21,...,2@n) and (21,..., Zv) respectively called the oil and the vinegar 
variables. The internal transformation then consists of an n-tuple of polynomials 
F=(fi,---;fn) of the special form: 


fila, z) = ) Qi jLi25 + J Biti + J 2% + J 04,5 2424 FEG (2) 
1SiSn 1<i<n 1<i<v 1<i<j<v 
1<;<0 = 


where ai j, Bi, Yi, i,j, and € are randomly chosen from the base field F4. The z; 
are called oil variables because they do not mix, i.e. there is no cross-term 2;2;. 
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Vinegar variables z; in contrast, mix with other vinegar variables as well as 
with oil variables. The fact that the coefficients of the polynomials are chosen 
randomly is satisfactory since the resulting polynomials look closer to randomly 
chosen ones. However, the two types of variables makes it possible to create a 
signature scheme: in order to find some pre-image y = (y1,.--,Yn) through F 
the signer first draws some random values for 21, ..., Zy and substitutes them in 
the description of F. The resulting set of polynomials becomes linear in the oil 
unknowns z; and the associated n x n linear system (with y as right member) is 
easily solved: about 4 of the time, the system has a solution (a1,...,@n,) which 
makes (a, z) a pre-image of y through F and otherwise another choice for z is 
made until there is a solution. Obviously, this structure has to be hidden from 
the view of an attacker and the public key is the composition G = F o S where 
S : F} — F} ™” is a one-to-one affine application. 

The message size over signature size for the UOV signature scheme is not 
optimal since the number of vinegar unknowns must be at least twice big as the 
number of oil unknowns for it to be secure 22/19714). 


2.3 The Square-Vinegar Signature Scheme 


The square-vinegar signature scheme strives to provide an efficient alternative 
to UOV or HFE with the minus transformation applied. Let Fy be a finite field 
and E be an extension of degree n over F,. The internal transformation of the 
square-vinegar scheme is defined as: 


F:ExFl) — E, (X,X,) — aX?+B(X,)X+7(X) , (3) 


where a is a constant randomly chosen from E, 6 : Fj — E is a randomly chosen 
affine application, and y : F} — E is a randomly chosen F,-quadratic application. 
This internal transformation is hidden by two full rank affine applications S : 
K"** — E and T:E — Fý- Therefore S mixes the vinegar unknowns X, with 
the “normal” unknowns X. In addition to T, a projection IJ is applied where r 
of the n components have been removed as in SFLASH or HFE~~. The affine 
transforms S and T together with the applications 7, 8, and the constant a 
constitute the secret key. The public key P results from the composition of the 
three applications: P= IToToFoS. 

The use of an odd characteristic base field is advertised by the authors as 
a means to thwart Grobner bases attacks since introducing the corresponding 
field equations in the computation renders it unpractical. Mixing the vinegar 
unknowns with the normal ones breaks the algebraic relations between the input 
and the output that appeared in C* (bilinear relations [[7]) or HFE (algebraic 
relations of higher degree, as explained in [7[12]). Eventually, just as for HFE~~, 
removing part of the output information further mitigates Grobner bases attacks 
and prevents Kipnis and Shamir’s attack developed against UOV. 


Signature. The signing process is highly efficient. It only requires the holder 
of the secret key to randomly pick r elements from F4 to complete the mes- 
sage (M1,...,Mn—-r) to be signed into Mm = (m1,...,Mn—r, Mn-r+1,:-:, Mn) 
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and to invert the public application in three steps: ST! o F~!oT71!(m). Ap- 
plying T~' and S~! is a matter of multiplying with precomputed matrices and 
inverting F requires to find the roots of a quadratic univariate polynomial over E. 
In case there is no solution, the signer restarts the process by choosing another 
way of completing the message m into m. 


2.4 The Square Encryption Scheme 


A companion scheme to this square-vinegar signature scheme has been proposed 
in [4]. The square encryption scheme strives to provide an efficient and secure 
alternative to HFE and, as the square-vinegar scheme, has a square internal 
transformation: F : E — E, X ++ X?. The parameters are chosen so that the 
size of the base field verifies q = 3 mod 4 and the degree n of E over Fq is 
odd. The transformation F is again hidden by two full rank affine mappings 
S: Fj" > E and T : E — Ff, which yields a public key P = T o F oS. 
(Following [5], the authors proposed to fix r of the input unknowns to a pre- 
defined value (say, zero) to prevent the attacker from controlling the differential 
of the public key as in Dubois, Fouque, Shamir, and Stern’s cryptanalysis [6].) 


This scheme is somewhat reminiscent of the C* scheme, where F(X) = X gl 
for a well chosen 0. But for the square encryption where 0 = 0, the bilinear 
relations XY” = XY between X and Y = F (X) boils down to the tautology 
XY = YX. The embedding S aims to finish hiding the algebraic structure of 
the internal transformation. 


Decryption. The secrets’ holder is able to decrypt very efficiently: in addition 
to finding pre-images through T and S which amounts to solve simple linear sys- 
tems, the decryption process requires to compute a square root in the extension 
field E. Computing the square root is done by the square and multiply algorithm 


svi since q” = 3 mod 4. As there are two possible square roots, the 
right one is singled out as the one lying in the image of S. 


3 Cryptanalysis of the Square-Vinegar Signature Scheme 


We now describe a generic and very efficient attack against the square-vinegar 
signature scheme. Our attack proceeds in three steps: We first exhibit an in- 
variant of the internal transformation and recover it through the analysis of the 
differential of the public key; Then, we use this information to recover an equiva- 
lent representation of the vinegar space; In a third step, we transform the public 
key into a special shape that allows us to invert it efficiently. Put together, these 
three steps allow us to forge a signature for any given message. 


3.1 Alternative Decompositions 


Recall that the internal transformation of the square-vinegar signature scheme 
has the following structure: 


F:EXxEY—E, (X,X,)+ aX? + B(X)X+7(X) , 
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where a is a constant, 8 : F} — E is an affine F,-linear mapping, and y : Fj > E 
is a F,-quadratic mapping, where E is an extension of degree n over F,. The 
public key is the mapping P = HoT o Fo S, where IJ is a projection that 
removes r polynomials, § : Pote — E x Fj and T : E — Fj" are two affine 
linear mappings of full rank. The decomposition (T, F, S) of the public key is 
kept secret. 

A major component of the internal transformation F is the mixing of vinegar 
unknowns with X. It makes it harder for an attacker to use the specific structure 
of a univariate quadratic polynomial of F viewed as a function of X. A crucial 
remark is that there exist linear mappings that, when composed with the internal 
transformation, not only conserve its special form, but also discard the part of F 
mixing the vinegar X, with X. Indeed, consider the mappings o : (X, X.) > 
(X — £@(X,), Xo) and 7 : Y ++ +Y. (Remember that the scheme is defined over 
a field F, of odd characteristic.) It can be checked that these mappings provide 
an alternative decomposition (T o 7, Fao S) of the public key such that 


F: (X, Xo) > X? +(X) 5 (4) 


where ¥ is a Fọ-quadratic mapping. We stress here that an attacker does not need 
to know the mappings o and 7 but rather assumes without loss of generality that 
the public key follows the specific decomposition (Ø. (Also note that in a similar 
fashion, keeping secret the defining polynomial of the extension has no effect: 
as two fields of the same size are isomorphic and the isomorphism is a linear 
bijective application, any arbitrary choice made by the attacker is “absorbed” 
in S and T.) This last decomposition can be further tweaked as in [LI] to remove 
the affine parts of the mappings S and T but at the expense of reintroducing a 
linear term in X, leading to an internal transformation of the following shape: 


PRI 44/04) (5) 


where 8’ is a constant from E and 7 is some F,-quadratic mapping. In the 
following sections, the attacker can therefore just assume wlog that the public 
key is decomposed as (T’, F’, S’) where S’ and T” are linear mappings, and F” 
is as given in (B): then (T’, F’, S’) contains enough information to forge valid 
signatures and thus constitutes an equivalent secret key. We call such a decom- 
position a “split decomposition” (the unknowns X and X, are now separated 
in the internal transformation). A split decomposition is not unique: iterates of 
the Frobenius mapping y : z +> z1 and multiplications A, : z => uz, u € 
do not alter the prescribed shape of the internal transformation (though céel: 
ficients might change); In particular, if (To, Fo, So) is a split decomposition, so 
are (To o A,-2, Ay2 0 Fo o Ay-1, Ay © So) and (To o y~*, pt o Fo o yp ', gê o So). 


3.2 Using the Multiplicative Property of the Differential 


In the previous section we showed how to discard the cross-contribution of X, 
and X. However, the contribution y(X,) still disturbs the algebraic properties 
of the univariate quadratic in X. In order to circumvent this difficulty, we make 
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use of a tool first introduced by Fouque et al. in that proved very useful 
in attacking multivariate cryptosystems: the differential of the public mapping. 
The differential of P in a is defined as: DP, (x) = P(x +a) — P(x) — P(a)+ P(0). 

In the case of an F,-quadratic mapping, DP, is such that (x,a) => DP, (x) 
is a symmetric bilinear mapping. From now on, we denote by DP this bilinear 
mapping and call it differential of P. The differential map corresponding to the 
internal transformation X + X? + BX +7(X,) of a square-vinegar instance is: 


DF((X, Xv), (Y, Yo)) = 2XY + Dy(Xv, Yo) - (6) 


The success of the attack lies in the fact that normal (X) and vinegar (X,) 
unknowns are separated in the expression of the differential DF. More precisely, 
the only linear mappings L such that for all (X, X») and all (Y, Y,): 


DF((L(X), X»), (Y,%)) — DF((X, X»), (L(Y), %)) =0 & L(X)Y =YL(X) 


are Z ++ AZ for À € E. Indeed, any solution L : Z > 2,2, 427 verifies 
Vicien XY? = Ð ej2, li XTY for all X and Y, and since (X,Y) =œ XV Y% 
forms a basis of the space of bilinear forms we must have l; = 0 for all i > 0. 
In addition, we conjecture that with very high probability (with respect to 
the uniform choice of the coefficients of y) the only linear mappings L verifying 


VXyVY, Dy(L(X,), Yo) — Dy (Xv, L(%)) = 0 


are Zy + cZ, for some c € Fy. This might be heuristically justified by the fact 
that the random choice of y does not allow such an algebraic property to appear, 
and is verified experimentally. Assuming this conjecture is true, we have: 


Proposition 1. For a random instance of the square-vinegar scheme, it happens 
with very high probability that the only linear mappings L verifying: 


W(X, X)W, Ya) DF(L(X,X.),(¥,%)) - DF((X, Xv), L(¥,¥,)) =0 (7) 


are (Z, Zo) = (AZ, cZ»), where AE E and c € Fy. 


Proof. Write L : (Z, Ze) => (AZ + CZ,,CZ + BZ,) for some solution of (J. 
Since the equation holds for all inputs of DF, consider it specialised at X, = 0 
and Y, = 0, with DF replaced by its expression @: 


VXWY [2A(X)¥ + D+(C(X),0)] — [2X A(Y) —D7(0,E(Y))] =0 . 


As Dy(*,0) = 0 and Dy(0,*) = 0 for any ‘x’, this gives A(X)Y = XA(Y) 
which, as we saw above, implies A: Z+ AZ for A € E. Similarly, at X = 0 and 
Y =0, Ø becomes: VX,VY, Dy( BOG): Yo) — Dy(Xv, B(Y»)) = 0, implying 
B : Z > cZ for c € Fy by conjecture. Finally, at X = 0 and Y, = 0, ( becomes: 
YVXVY D(X, C(Y)) = 2C(X,)Y. Assume for a contradiction that C is not 
identically null. Then setting X, = xı such that C(a) Æ 0, the right hand side 
spans a vector space of dimension n while the left hand side spans a vector space 
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of dimension at most v. Hence, when v < n as in a square-vinegar instance, 
C must be identically null. Then, for all (X,,Y), we have D7(X,,C(Y)) = 0 
or equivalently y(X, + C(Y)) = y(Xv) + 7(C(Y)). In particular, this holds for 
X, = C(X) for any X and any Y so that Z + 7(C(Z)) is affine, that is, y is 
affine over Im(C). For a random y it is improbable that y is affine over some 
(non-zero) sub-space. Hence, with high probability, C is identically null. 


This property of F naturally transports to the public key, provided the removal 
of polynomials do not completely destroy its algebraic structure: 


Claim 1. If the number of coordinates removed by the projection IT is less than 
half and the coefficients of y are randomly chosen, the set of linear mappings L 
satisfying 

VX VY DP(L(X),¥) — DP(X, L(Y)) =0 


is oe o Aye © Shuck,ceF,; i.e. the conjugates by the secret mapping S of all the 
multiplications Au c : (X, Xo) => (uX,cX,), where u € E and c € Fy. 


3.3 Extracting the Vinegar Vector Space 


The solution set X of Claim [[ can be easily determined as it amounts to solve 
a linear system of (n — r)(n + v)? equations in the (n + v)? unknowns of L over 
a finite field of size g. Let us call “vinegar vector space” the image through S of 
all the values v such that the n first coordinates of S(v) are zero. Similarly, let 
us call “normal vector space” the image through S of all the values v such that 
the v last coordinates equal zero. Before explaining how to use the knowledge 
of X to recover these two vector spaces, let us state three useful lemmas. 


Lemma 1. Letu be in E, T, be the minimal polynomial of u over Fg, and X4... 
be the characteristic polynomial of Au, : (X, Xo) > (uX, cX»). Then: 


XAu,e (2) = (£ — 0)” > Ty (a) Fee 


Lemma 2. Letu be in E and T, the minimal polynomial of u over Fy. Then: 


Tu(x) = (x — u)(x — u1) --- (x — ge 


Lemma 3 (Thm. 3.25 [[5]). The number of irreducible monic polynomials of 
degree n in F,[X] is a Dan u(d)qt where u is the Möbius functiontl It follows 
that the number of elements in E with a minimal polynomial of degree n is at 


least q” — q? —q3-1 —---—q? —@. 


Let M be any element picked at random from the solution set X of Claim[]] Since 
M = S7} o Auco S for some (u,c) € Ex Fy, M and Au,c are conjugate and thus 
have the same characteristic polynomial ym(£) = (£ — c)? -mu(£) 777 according 
to Lemma] In addition, Lemma]shows that for u chosen uniformly at random 


1 u(1) = 1, u(x) = (—1) for z a product of k distinct primes, and u(x) = 0 otherwise. 
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in E, deg(7,,) has more than 1 — q/q? chances to be n. We can therefore assume 
in the following that c and m, are known from the factorization of x. 

The factorization of 7, over E in turn discloses uT for some unknown i. 
However, as stated at the end of Section BI] the split decomposition is not 
affected by iterates of the Frobenius mapping and thus it is enough to solve 
for S in the following linear system: 

SoM=A,, 05 ; 


q 


Any particular solution Sọ of this system is sufficient, since the whole space of 
solutions is a coset of the commutant of A, 4t e The commutant of X +> ul X is 
the space of multiplications, since u does not belong to any subfield of E. On the 
contrary, the commutant of X, +> cX, is the whole space of F,-linear mappings, 
since precisely c lies in Fy. At this point, the attacker is almost in the same 
position as the legitimate signer to produce a signature since he has access to 
the vinegar space through Sp and can now work on 


Po S5 (X, Xo) = H oTo (X? + BX +7(X.)) 


instead of the original public key P. Let us define P = P o So t 

The next step of the attack is to recover a mapping equivalent to T. To this 
end, we seek to cancel the part of P that is linear in X which can be achieved by 
using an adequate change of variables X ++ (X — b), where b is to be determined. 
The expression of P(X —b) with respect to X in turn contains a quadratic part, 
a linear part, and a constant part. Looking at the linear part alone, the attacker 
writes down that the set a coefficients of X are equal to zero; these coefficients 
are a set of (n — r) affine functions with respect to b and solving for b allows 
the attacker to recover 8. The final step is to recover an equivalent version of T. 
This is done by considering the part of P that is quadratic with respect to X: 
Q(X) = H o T(X?). By composing with multiplications over E, it is possible to 
complete the (n — r) coordinates of Q into a full set Q(X) of n coordinates by 
taking a basis of {Q(\X)},cx. Then, solving for T in Q(X) = T(X?) gives an 
equivalent representation To of T. 

At this point, the attacker gained the knowledge of So, To, and Bo such that: 


Po S5 (X, Xv) = H o To o (X? + bo X) + Po S5 +0, Xv) . 


We claim that this is equivalent to the knowledge of the secret key since the 
attacker is then able to sign any message m as efficiently as the legitimate signer 
as follows. Draw some random value X, from the vinegar space and randomly 
complete the (n — r) coordinates of m — P o Sj +(0, X») into an n coordinates 
value m. Compute Y = T} + (ñm) and solve for Xo in (X + 460)? = Y + 422. A 
signature of m is then given by $9 '(Xo, Xv). 


3.4 Complexity Analysis and Practical Parameters 

Our attack requires O(log? (q)(n + v)°) operations to find the solution set X of 
Claim [J and O(log”(q)(n + v)*) operations to factor the characteristic polyno- 
mial y. The particular solution So is found with O(log? (q)(n + v)°) operations. 
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The complexity of the other steps can be neglected and thus the attack has an 
overall complexity of O(log?(q)(n + v)°). 

The authors of the square-vinegar signature scheme claimed a 80-bits security 
for the following parameter sets: 


parameter set 1] parameter set 2 | 


removed polynomials r 


The complexity of our attack is about 2°° and our Magma program in appendix 
completes within minutes for both parameter sets on a common desktop PC. 


4 Cryptanalysis of the Square Encryption Scheme 


The square encryption scheme poses new challenges to the attacker. Its design 
strategy of embedding the plaintext into a bigger space before applying the 
internal transformation makes it impossible to use the differential mapping as 
was done previously. This is due to the restricted view the attacker has on the 
input space which does not allow to manipulate the inner of the differential 
easily. In our attack against the square encryption scheme, we therefore use a 
different technique. Instead of peeling off the cryptosystem from the input, we 
peel it off from the output. 


4.1 Equivalent Representation of the Secret Key 


Due to the specific form of the internal transformation and without loss of gen- 
erality, we may give the following alternative decomposition of the public key: 


P(X) =T(S(X)?) +T(s-S(X)) +t, (8) 


where S and T are the linear part of the original secret linear mappings and 
s = $0 and t = T + T(o”) with ø and 7 the original secret constants from E. 
Since the mappings S' and T are linear, it can be easily seen that with respect to 
the input X, the first term of (B) is F,-quadratic, the second term is linear, and 
the third term is constant. Furthermore, these three homogeneous terms can be 


read directly on the public key itself, so that the attacker knows the following: 


Px(X)=T(S(X)?), A(X) =T(s-S(X)),  P(X)=t. 


4.2 Looking for Invariant Subspaces 


As with the signature scheme, the differential of the public key provides useful 
information to the attacker. In the case of the square encryption scheme, it can 
be expressed as: 

DP(X,Y) = T(2 -S(X)- S(Y)) ; 
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Let consider the partial mappings DP, : X +> DP(X,y). Since S : Fy" — E 
has full rank, its image is of dimension (n — r). Hence, choosing any linearly 
independent vectors y1, ..., Yn-r makes DPy,, ..., DPy,_, span the whole 
vector space of mappings {DP-.}.cg. This shows that the attacker is able to 
derive a set of mappings A = {P,}U {DP}; }i=1,....n-r each of which has the 
special form T o Aa 0 S, where Aq stands for the multiplication by a in E. This 
set of mappings can then be rewritten as A = {T o A), o S ẹi=1,....n-r+1 where 
the n—r-+1 values Ay, ..., An-r+1 are unknown, but linearly independent. 
The attacker does not need to know the actual value of the A; since he can 
exploit this set of mappings in as follows. The general idea is to look for linear 
mappings L that can link the public equations, say two elements Dı = To A), 0S 
and Də = T o Ay, o S from A. One natural idea is then to look for L such that: 


LoD; = Də > (9) 


since it can be easily checked that Lo = T o A ApAz? © T~ 1 is a particular solution 
of (Q). However, the solution space of (Q is not restricted to multiplications. 
This is due to the ‘embedding’ mechanism, i.e. the fact that the mapping S is 
not a one-to-one mapping, which release some of the constraints and allows less 
structured linear mapping to be solutions. 

A possible direction to solve this issue is to put more constraints on the 
mapping L while being careful to keep mappings of the form T o A, o T7! in the 
solution space. This is why we not only look for a linear mapping that solves (), 
but several equations similar to (Q) simultaneously. This can be reformulated in 
terms of A as follows. We look for linear mappings L such that: 


Vie {1,...,m},  Lo(ToAy,08) € (ToAy,,,,08,..., TOA, oS) , (10) 


m+1 n—r+1 


that is, the image through L of m elements of A must lie in the vector space 
spanned by the remaining elements of A. It is easy to see that if À is such that: 


Vi € {1,...,m}, Ac Ai E€ (Amts ++ Anr); (11) 


then T o A, o T7} must be solution of (0). 

The parameter m controls the number of solutions of (IQ) and (T. It can 
be used to simultaneously render system (QI) under-determined and system (10) 
over-determined. This ensures that no other solutions except than the conjugates 
of multiplications. We can determine suitable values of m as follows. For i < m, 
the fact that A-A; lies in (Am+1,---;An—r41) puts n—((n—r+1)—m) constraints 
on the n coordinates of A in Fy. As Aq, ..., An—r+1 are linearly independent, 
the above constraints are independent. Hence (IJ) admits solutions as soon as 
n>m(n—(n—r+1—m)). Similarly, the whole space of linear mappings L has 
dimension n? and each equation of puts n(n—r)—(n—r+1—m) constraints 
as mappings from A map Fj" to Fj. Therefore, system (0) is over-determined 


as soon as n? < m(n(n — r) — (n — r+ 1 — m)). These two conditions define 
a range of values of m such that the solution space of (0) becomes isomorphic 
to the solution space of (L). This behavior is entirely confirmed by our Magma 
implementation of the attack. 


462 O. Billet and G. Macario-Rat 


4.3 Recovery of the Secret Elements 


Once a linear mapping L = T o A, o T~! has been recovered, every element of 
the secret key can be computed. By proceeding just as for the signature scheme, 
the underlying multiplication À is revealed from the characteristic polynomial 
of L. An equivalent representation To of T is then recovered by solving for T 
in To L= Ax 0T. Let a be a randomly chosen element. The other component 
of the secret key can then be found via: 


S(a) = Tp (P2(a)) 5 $9 = - Ty '(Pi(a)) 5 So = — Tr o Py . 


(In the case where To (P2(a)) is not a square in E, just replace To by —Toọ.) 


4.4 Practical Parameters 


The most time consuming step of our attack is to compute the solution space 
of (1) which requires O(log? (q)nf) operations. The authors of the square en- 
cryption scheme claimed a 80-bit security for the following parameter sets: 


Po parameter set 1 | parameter set 2 
field size 4 


polynomials n 


but the complexity of our attack actually is about 2°° operations for the first 
parameter set and about 2°° for the second. Again, the key recovery written in 
Magma only requires a couple of seconds to complete on a standard workstation. 
During the attack, m = 2 was enough in practice to ensure that only conjugates 
of multiplications were solutions. 
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A Simple Auxiliary Functions for Our Magma Scripts 


Simple functions. The following function returns a root of ax? + br + c. 


1 SOLVE_2ND_DEGREE:=function(a, b, c) 

2 is_, sqrt_delta:—IsSQuaRE(b?—4*axc) ; 

3 return js_,(is_ select (—b+sqrt_delta) /(2xa) else 0); 
4 


end function; 
Juggling between matrices and vectors: 


5 MaT2VEC:=funce< MAT | VECTOR(ELTSEQ(MAT)) >; 
6 VeEC2MaT:=fune< vect, ncol | MATRIX(ncol, ELTSEQ(vect)) >; 


SPACE returns the vector space spanned by a set of matrices MS viewed as vectors: 


7 SPACE:=fune< MS, KK, dim | 
8 sub<VECTORSPACE(KK, dim)|[MAT2VEC(MS[i]) : i in [1..#MS]]> >; 


The following returns the matrix of x œ Ax: 


9 MuLBy:=fune< à, ETOV, VTOE, B | 
10 MatRix(([ETOV(VTOE(BIi])*A) : i in [1..#B]]) >; 


Sequences of coefficients. It can be convenient to represent a quadratic poly- 
nomial as sequences of coefficients of its homogeneous degree 0, 1, and 2 com- 
ponents. Co12 takes a function P viewed as a sequence of n_pol polynomials on 
n_var variables and outputs the corresponding sequences CSO, CS1, and CS2: 


11 Co12:=function(KK, V_INPUT, P, n_pol, n_var) 

12 CS0:=[KK ! 0:if in [1..n_pol]]; 

13 CS1:=[[KK ! 0:/ in [1..n_var]]:ii in [1..n_pol]]; 

14 CS2:=[[[KK ! 0:j in [1.. iJ]: in [1..n_var]]|:ii in [1..n_pol]]; 
15 x:=V_INPUT ! 0; y:=P (x); 

16 for ii:=1 to n_pol do CSO{ii|:=y|ii]; end for; // constant 
17 for i:=1 to n_var do 


18 X:=V_INPUT ! 0; xļi]:=KK ! 1; yi:=P(x); x[f]:=KK ! —1; yo:=P(x); 
19 for ii:=1 to n_pol do 

20 CS1[ii][/]:=(y1[ii]—yolii])*(KK!2)—'; // coefficient of z4, 

21 CS2[ii] [/][/]:=(y1 [ii] +ye|ii])*(KK !2)~'—CSOfii];  // and z2, 

22 end for; 


23 end for; 
24 for /:=2 to n_var do for j:=1 to i—1 do 


25 X:=V_INPUT ! 0; xļi]:=KK ! 1; x[j]:=KK ! 1; y:=P(x); 

26 for ii:=1 to n_pol do 

27 cs2fiil(i[j) = yli)—cs2til ti- cs2țily]]- 

28 CS1[ii][i]—CS1[#][j]—CSO[i]; // and xiz; iF j 
29 end for; 


30 end for; end for; 
31 return CSO, CS1, CS2; 
32 end function; 
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Given three sequences of coefficients Co, C1, and Cə defined with respect to 
a quadratic polynomial P as above, compute the value taken by P on input x: 


33 EVAL:=fune< Co, Ci, Co, x, n_var | 
34 &+[ &+[Co[i][/]*x[i]*x[f] : j in [1..i]] : i in [1..n_var]] 
35 + &+[C1li]xx]iļ]:i in [1..n_var]] + Co >; 
The next function computes the coefficients of the differential associated to 
the homogeneous form of degree 2 specified by the sequence of its coefficients: 


36 DıFF:=function(CS2, KK, n_pol, n_var) 
37 DP:=[ZEROMATRIX(KK, n_var,n_var): ii in [1..n_pol]]; 
38 for ii:=1 to n_pol do 


39 for j:=1 to n_var do 

40 DPiii][/, (}:=2*CS2[ii] [i] [i]; 

41 for j:=1 to i—1 do 

42 DP{ii] li, /]:=CS2[#] [i] [j]; DP[# [/, ]:=CS2[ii] [il [j]; 
43 end for; end for; end for; 


44 return DP; end function; 


B Magma Script to Attack the Signature Scheme 


An extension E of degree n over the base field K, also viewed as vector space V: 
45 Q:=31; n:=31; v:=4; r:=3; K:=GF (q); E:=ext<K|n>; 
46 V,E£2V:=VecTorSpace(E,K); V2E:=E2V—1; 


47 V_INPUT:=VECTORSPACE(K,n-++v); V_VINEGAR:=VECTORSPACE(K, v); 
48 V_MESSAGE:=VECTORSPACE(K ,n—r); V_RANDOM:=VECTORSPACE(K,,r); 


We then randomly draw a secret key: the coefficient a, the linear mapping 2, 
and the quadratic mapping y to form the internal transformation 


F : (X, Xo) => aX? + B(X,)X+7(Xv) , 


49  a:=V2E(A[1]) where A is RANDOM(GL(n,K)); // ensures a 4 0 


50 (9:=RANDOM(E); 61:=[Ranpon(E):i in [1..v]]; 

51 (:=fune< Xv | &+[81[i]xXv[i]:i in [1..v]] + Bo >; 

52  yo:=RanDOM(E); y1:=|Ranpon(E):i in [1..v]]; 

53. Y2:=[[RANDOM(E):/ in [1..il]]:i in [1..v]]; 

54 q:=func< Xv | 

55 &+[ &+hlilj]*Xv[i]Xv[j] : j in [1.. i] : i in {[1..v]] + 
56 &+[yli]*Xv[i]:i in [1..v]] + 70 >; 

57 F:=fune< X,Xv | axX?+6(Xv)xX+7(Xv) >; 

and randomly draw input and ouput linear layers S and T: 


58 S$ ,:=RANDOM(GL(n-+v, K)); So:=RANDOM(V_INPUT) ; 
59 7Ty:=RANDOM(GL(n, K)); To:=RANDOM(V); 
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The corresponding public key is obtained via P= To Fo S: 

60 P:=function(input) 

61 XX:=input*S,+So; 

62 X:=V2E (VECTOR([XX[/]:/ in [1..n]])); // normal variables 
63 Xv:=VECTOR([XX[i]:/ in [n+1..n+v]]);  // vinegar variables 
64 return E2V( F(X,Xv) )xTı+T0; 

65 end function; 


The coefficients of homogeneous parts for the set of forms corresponding to P is 
obtained via: 

66 PuBCO, PuBC1, PUBC2:=Co12(K, V_INPUT, P, n—r, n+v); 

We are now able to verify if a signature is valid: 

67 VERIFY:=function(msg, sig) 

68 m:=| EvaL(PuBC2ļi], PuBC1[/], PUBCO[/], sig, n+v) : i in [1..n—r]]; 

69 return &and| mii] eq msg|i]: i in [1..n—r]]; 

70 end function; 

We now compute an equivalent secret key. First, we look for the linear mappings 
Mx verifying: Mx x DP — DP x Mx = 0. 

71 B:=Basis(V); PR:=POLYNOMIALRING(K, (n+v)?); 

72 Mx:=Matrix(nt+yv, [PR.i:i in [1..(n+v)?]]); 

73 DP:=DIFF(PUBC2, K,n—r,n+v); 

74 EQs:=[ELTSEQ(Mx«*DP[/i]—DP|ii/]*TRANSPOSE(MX)): ii in [1..n—r]]; 

75 GB:=|]; 

76 for ii:=1 to n—r do 

77 GB:=GROEBNERBASIS(GB cat Easiii]); 


78 if #GB + n + 1 eq (n+v)? then break; end if; 
79 end for; 


We choose a particular solution M_ by removing the n + 1 degrees of freedom 
by fixing the remaining unknowns to random values, and extract the two roots 
c € K and a € E of the characteristic polynomial of M_. 

s0 repeat W:=GROEBNERBASIS([PR.((n+v)?—i) + RANDOM(K):/ in [0..n]] 


at GB); 


si until not(W eq [PR !1]); // complete consitently 

s2 M_:=MATRIX(n-+v, [K ! EvaLuatTe( W |i], PR.i,O):i in [1.. (n+v)?]]); 

83 CPOL:=FACTOREDCHARACTERISTICPOLYNOMIAL(M_); 

84 if not(#CPoL eq 2) then “Bad Char. Pol.”; exit; end if; 

85 c:=RooTts(CPoL[1][1])[1][1]; // factor of degree 1 

s6 @:=ROOTS(POLYNOMIALRING(E) ! CPoL[2][1])[1][1];  // of degree n 
M_ must be similar to the matrix of (X, X») > (aX,cX,), which will disclose a 
particular solution S_ as useful to sign as S: 

87 A:=MULBY(a, E2V, V2E, B); 

88 is_similar,S_:=|SSIMILAR(M_, DIAGONALJOIN(A, SCALARMATRIX(V,c)) ); 
89 if not(is_similar) then “Recovering S_ failed.”; exit; end if; 
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Applying the change of base S_, we get (Z, Z,) = T(Z? + -Z+ I(Za)): 
90 Z:=VECTOR([PR.i:i in [1..n-+v]])*MATRIXALGEBRA(PR, n+v) ! (S_); 


91 PuBZ:=[EvaAL(PuBC2[/], PUBC1[/], PUBCO[/], Z, n+v):i in [1..n—r]]; 
To get rid of the term B - Z, we look for Y such that the coefficient of Z in 
T((X+Y)?+6-(Z+Y)+4(Zy) becomes zero: 

92 Vo:=RANDOM(V_VINEGAR) ; 

93 ZPY:=[PR!0:i in [1..(nt+v)?]]; //Z+Y 

94 for i:=1 to n do ZPY(i]:=PR.i+PR.(i+n+v); end for; 

95 for i:=1 to v do ZPY{[i+n|:=Vo|i]; end for; 

96 PUBZV:=[EVALUATE(PUBZ[/], ZPY):/ in [1..n—r]]; 

97 OY:=[PR!0:i in [1..(ntv)7]]; // (Z,Y) =(0,Y) 

98 for i:=1 to n do OY|i+n-+v]:=PR.(i+n+v); end for; 

99 EQLIN:=&cat|[EVALUATE(COEFFICIENT(PUBZV{/], PR./, 1), OY) 

00 jin [1..n]]:i in [1..n—r]]; // equations 2Y = 8 

01 Yo:=GROEBNERBASIS(EQLIN) ; 

02 beta_:=VECTOR([K ! EvALUATE(Yo|/],PR.(i/+n+v),0):/ in [1..n]]); 

We are now able to get the polynomials corresponding to T(Z 24 ¥(Zy)): 
o3 for i:=1 to n do ZPY|i]:=PR.i—beta_{i]; end for; 

04 PUBZO:=[EVALUATE(PUBZ[/], ZPY):/ in [1..n—r]]; 

We recover go= 7(0) (remember vinegar part of ZPY was set to zero above): 
o5 Qo:=[K ! EvaLuaTe(PuBZO[i], [PR ! 0: in [1..(n+v)?]]):/ in [1..n—r]]; 
and thus Z ++ T(Z?) together with its differential (X,Y) > 2XY 

106 PUBZ2:=[PUBZO[i]—go|/]:/ in [1..n—r]]; 

107 DPuBZ2:=[SUBMATRIX(S_*DPI/]*TRANSPOSE(S_),1,1,9,1):/ in [1..n—r]]; 
but also (X,Y) + 2a? XY: 

108 DPuBZA:=[A*DPuBZ2|/]*TRANSPOSE(A):/ in [1..n—r]]; 


This allows us to complete T into a full rank mapping T_ via T_(X) = $DP(X, 1): 


109 SPA:=SPACE(DPUBZ2 cat DPUBZA, K, nxn); SP2:=SPace(DPUBZ2, K,n«n); 
110 W:=Basis(COMPLEMENT(SPA, SP2)); 

111 DPPLUS:=DPUBZ2 cat [VEC2MaAT(W|i],n) : i in [1..4W]]; 

12 T_:=(K ! 2) ~'*Matrix([VecTor(|(8[/]*DPPLus[j], B[1]) 

113 :j in [1..n]]) : i in [1..n]]); 

and to forge a signature for any message: 


114 msg:=RANDOM(V_MESSAGE) ; 

115 repeat 

116 =Y:=VECTOR(ELTSEQ(msg—VECTOR(go)) cat ELTSEQ(RANDOM(V_RANDOM))) ; 
117 is_square, sqrX:=ISSQUARE( V2E(Y*T_~') ); until is_square; 

118 forged:=VECTOR(ELTSEQ( E2V(sqrX)—beta_ ) cat ELTSEQ(Vo))*S_; 

119 if VERIFY(msg, forged) then “Forged signature.”; end if; 
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C Magma Script to Attack the Encryption Scheme 


20 q:=31; n:=37; r:=3; K:=GF(q); E:=ext<K|n>; 

21 Vi:=VECTORSPACE(K,n—r); Vo, K2V:=VeEcTORSPACE(E,K); V2K:=K2V 7}; 
Build the secret key, the encryption function P, and coefficients: 

22 Ly:=SUBMATRIX(RANDOM(GL(n, K)),1,1,9—r, n); Lo:=RANDOM(GL(n, K)); 
23. 14:=RANDOM(GL(n, K))[1]; /2:=RANDOM(VoO); 

24 PENCRYPT:=fune< plain | K2V(V2K(plainkLy+l,)*)*Lo+ls >; 

25 PuBCO, PuBC1, PUBC2 := Coi2(K, Vi, PENCRYPT, n, n—r); 

The mappings A = {DP}; }ic{1n—r] for linearly independant y1, ..., Yyn—r: 
26 DP:=DiFF(PuBC2, K, n, n—r); Y:=RANDOM(GL(n—r, K)); 

27 A:=[TRANSPOSE(MATRIX([Y[k]*DP[/] : i in [1..n]])) : k in [1..n—r]]; 
The set A of linear mappings verifying (Q) for some parameter m: 

28 m:=2; 6:=[Ali]: i in [m+1..n—r]]; SP:=SPace(d, K, (n—r)x*n); 

29 DUAL:=TRANSPOSE(NULLSPACEMATRIX(TRANSPOSE(BASISMATRIX(SP)))) ; 

30 P1:=TRANSPOSE(MATRIX(PUBC1)); B:=Basis(VECTORSPACE(K,n?)) ; 

31 MMuL:=fune<A | MATRIX([MAT2VEC(A*VEC2MAT(B{i],9)): i in [1..#B]])> 
32 A:=&meet|NULLSPACE(MMUL(A[/])*DUAL): i in [1..m]] 

33 meet NULLSPACE(MMUL(P;)*DUAL) ; 


Compute the characteristic polynomial CP of a random linear mapping in A: 

34 M:=VEC2MAT(RANDOM(A), n); CP:=FACTOREDCHARACTERISTICPOLYNOMIAL(M) ; 
35 @:=ROOTS(POLYNOMIALRING(E) ! CP[1][1])[1][1]; 

36 A:=MuLBY(a, K2V, V2K, Basis(Vo)); 

Recover the secret elements: 

37 res, L2_:= ISSIMILAR(M, A); 
38 v:=V2K (VECTOR(|[(R*DPIj], 
39 res, S:=ISSQUARE(V); 

40 if not res then L2 :=—L2_; res, s:=ISSQUARE(—v); end if; 
a. 11_:=K2V(V2K(R*P,*L2_—')/(2*s)); 

42 L1_:=P,*L2_~'*MULBy(1/V2K (2+/1_), K2V, V2K, Basis(Vo)) ; 
43 12_:=PENCRYPT(VI ! 0)— sash sal oe )?)*L2_; 

44 IML1_:=sub<Vo|[L1_[/]:/ in [1..n—r]|>; 


R:=RANDOM(VI); 
R): j in [1..n]})*L2_—1)/2; 


45 DISCLOSE:=function(cipher) // unlegitimate decryption! 
46 is_square, root:=ISSQUARE( V2K ((cipher—I2_)*L2_~—')); 
47 if is_square then Z:=K2V (root); 


48 if (Z—/7_) in IML1_ then return true, SOLUTION(L1_, Z—/1_); 
49 else if (—Z—/1_) in IML1_ then return true, SOLUTION(L1_, —Z—/1_); 
50 else return false, _; end if; end if; else return false, _; end if; 


51 end function; 


52 plain:=RANDOM(VI); b, p:=DISCLOSE(PENCRYPT(plain)) ; 
53 ifb and (p eq plain) then “Decryption successful.”; end if; 
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Abstract. We present a new algorithm based on binary quadratic forms 
to factor integers of the form N = pq’. Its heuristic running time is expo- 
nential in the general case, but becomes polynomial when special (arith- 
metic) hints are available, which is exactly the case for the so-called NICE 
family of public-key cryptosystems based on quadratic fields introduced 
in the late 90s. Such cryptosystems come in two flavours, depending 
on whether the quadratic field is imaginary or real. Our factoring al- 
gorithm yields a general key-recovery polynomial-time attack on NICE, 
which works for both versions: Castagnos and Laguillaumie recently ob- 
tained a total break of imaginary-NICE, but their attack could not apply 
to real-NICE. Our algorithm is rather different from classical factoring 
algorithms: it combines Lagrange’s reduction of quadratic forms with a 
provable variant of Coppersmith’s lattice-based root finding algorithm for 
homogeneous polynomials. It is very efficient given either of the following 
arithmetic hints: the public key of imaginary-NICE, which provides an 
alternative to the CL attack; or the knowledge that the regulator of the 
quadratic field Q(,/p) is unusually small, just like in real-NICE. 


Keywords: Public-key Cryptanalysis, Factorisation, Binary Quadratic 
Forms, Homogeneous Coppersmith’s Root Finding, Lattices. 


1 Introduction 


Many public-key cryptosystems require the hardness of factoring large integers 
of the special form N = pq?, such as Okamoto’s Esign [Oka90), Okamoto and 
Uchiyama’s encryption [OU98], Takagi’s fast RSA variants [Tak98], and the large 
family (surveyed in [BTV04]) of cryptosystems based on quadratic fields, which 
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was initiated by Buchmann and Williams’ key exchange [BW88], and which 
includes NICH] cryptosystems (whose main feature 
is a quadratic decryption). These moduli are popular because they can lead 
to special functionalities (like homomorphic encryption) or improved efficiency 
(compared to RSA). And no significant weakness has been found compared to 
standard RSA moduli of the form N = pq: to the best of our knowledge, the only 
results on pq? factorisation are [PO90[Per0Il[BDH99]. More precisely, 
Perol] obtained a linear speed-up of Lenstra’s ECM, and Sect. 6] can 
factor in time Õ(N"°) when p and q are balanced. Furthermore, computing 
the “squarefree part” of an integer (that is, given N € N as input, compute 
(r,s) € N? such that N = r?s with s squarefree) is a classical problem in 
algorithmic number theory (cf. [AM94]), because it is polynomial-time equivalent 
to determining the ring of integers of a number field [Chi89]. 

However, some of these cryptosystems actually provide additional informa- 

tion (other than N) in the public key, which may render factorisation easy. 
For instance, Howgrave-Graham showed that the public key of 
disclosed the secret factorisation in polynomial time, using the gcd extension 
of Coppersmith’s root finding method Cop]. Very recently, Castagnos and 
Laguillaumie showed that the public key in the imaginary version 
of NICE allowed to retrieve the secret factorisation in polynomial 
time. And this additional information in the public key was crucial to make 
the complexity of decryption quadratic in imaginary-NICE, which was the main 
claimed benefit of NICE. But surprisingly, the attack of does not work 
against REAL-NICE [JS WO3], which is the version of NICE with real (rather than 
imaginary) quadratic fields, and which also offers quadratic decryption. In par- 
ticular, the public key of REAL-NICE only consists of N = pq?, but the prime p 
has special arithmetic properties. 
OUR RESULTS. We present a new algorithm to factor integers of the form 
N = pq’, based on binary quadratic forms (or equivalently, ideals of orders of 
quadratic number fields). In the worst case, its heuristic running time is exponen- 
tial, namely O(p!/?). But in the presence of special hints, it becomes heuristically 
polynomial. These hints are different from the usual ones of lattice-based factor- 
ing methods where they are a fraction of the bits of the 
secret prime factors. Instead, our hints are arithmetic, and correspond exactly 
to the situation of NICE, including both the imaginary 
and real versions [JSWO8]. This gives rise to the first general key-recovery 
polynomial-time attack on NICE, using only the public key. 

More precisely, our arithmetic hints can be either of the following two: 


i. The hint is an ideal equivalent to a secret ideal of norm q? in an imaginary 
quadratic field of discriminant —pq?: in NICE, such an ideal is disclosed by the 
public key. This gives an alternative attack of NICE, different from [ELOJ]. 

ii. The hint is the knowledge that the regulator of the quadratic field Q(,/p) is 
unusually small, just like in REAL-NICE. Roughly speaking, the regulator is a 
real number which determines how “dense” the units of the ring of integers 
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of the number field Q(,/p) are. This number is known to lie in the large 


interval [log ($(/p—4+ VD), Jir ($logp+ 1)| . But for infinitely many 
p (including square-free numbers of the form p = k? + r, where p > 5, r|4k 
and —k <r < k, see [Deg58}), the regulator is at most polynomial in log p. 
For these unusually small regulators, our algorithm heuristically runs in time 
polynomial in the bit-length of N = pq?, which gives the first total break of 
REAL-NICE [JSW08]. We stress that although such p’s are easy to construct, 
their density is believed to be arbitrary small. 


Interestingly, our algorithm is rather different from classical factoring algo- 
rithms. It is a combination of Lagrange’s reduction of quadratic forms with a 
provable variant of Coppersmith’s lattice-based root finding algorithm 
for homogeneous polynomials. In a nutshell, our factoring method first looks for 
a reduced binary quadratic form f(x,y) = ax? + bry + cy” representing prop- 
erly q? with small coefficients, i.e. there exist small coprime integers xo and yo 
such that q? = f(xo, yo). In case i., such a quadratic form is already given. In 
case ii., such a quadratic form is found by a walk along the principal cycle of 
the class group of discriminant pq?, using Lagrange’s reduction of (indefinite) 
quadratic forms. Finally, the algorithm finds such small coprime integers zo and 
yo such that q? = f(xo, yo), by using the fact that gcd(f(xo, yo), pq) is large. 
This discloses q? and therefore the factorisation of N. In both cases, the search 
for xo and yo is done with a new rigorous homogeneous bivariate variant of Cop- 
persmith’s method, which might be of independent interest: by the way, it was 
pointed out to us that Bernstein [BerQ3} independently used a similar method 
in the different context of Goppa codes decoding. 

Our algorithm requires “natural” bounds on the roots of reduced quadratic 
forms of a special shape. We are unable to prove rigorously all these bounds, 
which makes our algorithm heuristic (like many factoring algorithms). But we 
have performed many experiments supporting such bounds, and the algorithm 
works very well in practice. 


FACTORISATION AND QUADRATIC FORMS. Our algorithm is based on quadratic 
forms, which share a long history with factoring (see (CP0I]). Fermat’s factoring 
method represents N in two intrinsically different ways by the quadratic form 
x? + y?. It has been improved by Shanks with SQUFOF, whose complexity is 
O(N/4) (see for a detailed analysis). Like ours, this method works 
with the infrastructure of a class group of positive discriminant, but is different 
in spirit since it searches for an ambiguous form (after having found a square 
form), and does not focus on discriminants of a special shape. Schoof’s factoring 
algorithms are also essentially looking for ambiguous forms. One is based 
on computation in class groups of complex quadratic orders and the other is 
close to SQUFOF since it works with real quadratic orders by computing a 
good approximation of the regulator to find an ambiguous form. Like SQUFOF, 
this algorithm does not takes advantage of working in a non-maximal order 
and is rather different from our algorithm. Both algorithms of runs in 
O(N'/*) under the generalised Riemann hypothesis. McKee’s method 
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is a speedup of Fermat’s algorithm (and was presented as an alternative to 
SQUFOF) with a heuristic complexity of O(N'/) instead of O(N1/?), 

SQUFOF and other exponential methods are often used to factor small num- 
bers (say 50 to 100 bits), for instance in the post-sieving phase of the Number 
Field Sieve algorithm. Some interesting experimental comparisons can be found 
in [Mil07|. Note that the currently fastest rigorous deterministic algorithm actu- 
ally has exponential complexity: it is based on a polynomial evaluation method 
(for a polynomial of the form «(a — 1) --- (x — B + 1) for some bound B) and its 
best variant is described in [BGS07]. Finally, all sieve factoring algorithms are 
somewhat related to quadratic forms, since their goal is to find random pairs 
(x,y) of integers such that x? = y? mod N. However, these algorithms factor 
generic numbers and have a subexponential complexity. 


RoaD Map. The rest of the paper is organised as follows. The first section 
recalls facts on quadratic fields and quadratic forms, and present our heuristic 
supported by experiments. The next section describes the homogeneous Copper- 
smith method and the following exhibits our main result: the factoring algorithm. 
The last section consists of the two cryptanalyses of cryptosystems based on real 
quadratic fields (REAL-NICE) and on imaginary quadratic fields (NICE). 


2 Background on Quadratic Fields and Quadratic Forms 


2.1 Quadratic Fields 


Let D # 0,1 be a squarefree integer and consider the quadratic number field 
K = Q(VD). If D < 0 (resp. D > 0), K is called an imaginary (resp. a real) 
quadratic field. The fundamental discriminant Ax of K is defined as Ax = D 
if D = 1 (mod 4) and Ax = 4D otherwise. An order O in K is a subset of K 
such that O is a subring of K containing 1 and O is a free Z-module of rank 
2. The ring Oa, of algebraic integers in K is the maximal order of K. It can 
be written as Z+wZ, where wx = $(Ax + VA). If we set f = (Oa, : O] 
the finite index of any order O in Oa,, then O = Z + fwgZ. The integer f 
is called the conductor of O. The discriminant of O is then Ay = f? Ax. Now, 
let Oa be an order of discriminant A and a be a nonzero ideal of OA, its norm 
is N(a) = |Oa4/a|. A fractional ideal is a subset a C K such that da is an ideal 
of Oa for d € N. A fractional ideal a is said to be invertible if there exists 
an another fractional ideal b such that ab = Oy. The ideal class group of Oa is 
C(Oa) = I(Oa)/P(Oa), where I(O,) is the group of invertible fractional ideals 
of Oa and P(Oa) the subgroup consisting of principal ideals. Its cardinality is 
the class number of Oa denoted by h(O,y). A nonzero ideal a of Oy, a is said 
to be prime to f if a+ fOa = Oa. We denote by I(Oa, f) the subgroup of 
I(Oa) of ideals prime to f. The group O% of units in O4 is equal to {41} for 
all A < 0, except when A is equal to —3 and —4 (O*, and O%,, are respectively 
the group of sixth and fourth roots of unity). When A > 0, then Oh = (—1,€,) 
where ca > 0 is called the fundamental unit. The real number Ra = log(ea) is 
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the regulator of Oa. The following important bounds on the regulator of a real 
quadratic field can be found in [JLW95): 


log (5(va=4+ v3) < Ra < $4 ($108.4 +1). (1) 


The lower bound is reached infinitely often, for instance with A = z? + 4 with 
2 ł x. Finally, this last proposition is the heart of both NICE and REAL-NICE. 


Proposition 1 ({Cox99}| Proposition 7.20] Theorem 2.16]). Let 
Oa, be an order of conductor f in a quadratic field K. 


i. If A is an Oa, -ideal prime to f, then AN Oa, is an Oa,-ideal prime to f 
of the same norm. 
ii. Ifa is an Oa,-ideal prime to f, then aAOa, is an Oa, -ideal prime to f of 
the same norm. 
iii. The map vf: I(Oa,,f) > Oax, f), a aO a,x is an isomorphism. 


The map yy from Proposition [induces a surjection Gy : C(Oa,) > C(Oax) 
which can be efficiently computed (see |P'L00}). In our settings, we will use a 
prime conductor f = q and consider Ag = q?Ax, for a fundamental discriminant 
Ax. In that case, the order of the kernel of Gy is given by the classical analytic 
class number formula (see for instance [BVOT] 


h(Oa,) _ t - (Ax/q) if A, < —4, 6 
h(Oax) (q— (Ax/q))Rax/Ra, if Ay > 0. 


Note that in the case of real quadratic fields, €a, = EAs for a positive integer 
t, hence Ra,/Ra, =t and t | (q — (Ax/q)). 


2.2 Representation of the Classes 


Working with ideals modulo the equivalence relation of the class group is essen- 
tially equivalent to work with binary quadratic forms modulo SL2(Z) (cf. Section 
5.2 of (Coh00)). Moreover, quadratic forms are more suited to an algorithmic 


point of view. Every ideal a of O, can be written as a = m (az + =H Vaz) 


with m € Z, a € N and b € Z such that b? = A (mod 4a). In the remainder, 
we will only consider primitive integral ideals, which are those with m = 1. 
This notation also represents the binary quadratic form ax? + bry + cy? with 
b? — 4ac = A. This representation of the ideal is unique if the form is normal 
(see below). We recall here some facts about binary quadratic forms. 


Definition 1. A binary quadratic form f is a degree 2 homogeneous polynomial 
f(x,y) = ax? +bry+cy? where a, b and c are integers, and is denoted by (a, b, c]. 
The discriminant of the form is A = b? — 4ac. Ifa >0 and A <0, the form is 
called definite positive and indefinite if A > 0. 


Let M € SLo(Z) with M = aa and f = [a,b,c], a binary quadratic form, 


then f.M is the equivalent binary quadratic form f(ax + Gy, yx + oy). 
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Definite Positive Forms. Let us first define the crucial notion of reduction. 


Definition 2. The form f = [a,b,c] is called normal if —a < b < a. It is called 
reduced if it is normal, a < c, and if b > 0 fora= c. 


The procedure which transforms a form f = [a,b,c] into a normal one consists 
in setting s such that b + 2sa belongs to the right interval (see (5.4)]) 
and producing the form [a,b + 2sa,as? + bs + c]. Once a form f = [a,b,c] is 
normalised, a reduction step consists in normalising the form |c, —b, a]. We de- 
note this form by p(f) and by Rho a corresponding algorithm. The reduction 
then consists in normalising f, and then iteratively replacing f by p(f) until f 
is reduced. The time complexity of this (Lagrange-Gau8) algorithm is quadratic 
(see [BY07]). It returns a reduced form g which is equivalent to f modulo SL2(Z). 
We will call matriz of the reduction, the matrix M such that g = f.M. The re- 
duction procedure yields a uniquely determined reduced form in the class modulo 


SL2(Z). 


Indefinite Forms. Our main result will deal with forms of positive discrimi- 
nant. Here is the definition of a reduced indefinite form. 


Definition 3. The form f = [a,b,c] of positive discriminant A is reduced if 
|VA — 2Ial < b < VA and normal if —|a| < b < Jal for |a| > VA, and 
VA— 2\al <b < VA for |a| < VA. 


The reduction process is similar to the definite positive case. The time complexity 
of the algorithm is still quadratic (see [BV07 Theorem 6.6.4]). It returns a 
reduced form g which is equivalent to f modulo SL2(Z). The main difference 
with forms of negative discriminant is that there will in general not exist a 
unique reduced form per class, but several organised in a cycle structure i. e., 
when f has been reduced then subsequent applications of p give other reduced 
forms. 


Definition 4. Let f be an indefinite binary quadratic form, the cycle of f is 
the sequence (p*(g))icz where g is a reduced form which is equivalent to f. 


From Theorem 6.10.3 from [BY0%, the cycle of f consists of all reduced forms 
in the equivalence class of f. Actually, the complete cycle is obtained by a finite 
number of application of p as the process is periodic. It has been shown in 
that the period length £ of the sequence of reduced forms in each class 
of a class group of discriminant A satisfies as <l< ie +1. 

Our factoring algorithm will actually take place in the principal equivalence 
class. The following definition exhibits the principal form of discriminant A. 


Definition 5. The reduced form [1,| WA], (|WA|? — A)/4] of discriminant A 
is called the principal form of discriminant A, and will be denoted 14. 
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2.3 Reduction of the Forms [q?, kq, (k? + p)/4] and Heuristics 


In this subsection, p and q are two distinct primes of the same bit-size A and 
p=1 mod 4 (resp. p = 3 mod 4) when we deal with positive (resp. negative) 
discriminant. Our goal is to factor the numbers pq? with the special normalised 
quadratic forms [q?, kq, (k? + p)/4] or [q?, kq, (k? — p)/4], depending whether we 
work with a negative discriminant A, = —pq? or with a positive one A, = pq’. 
If p and q have the same size, these forms are clearly not reduced neither in the 
imaginary setting nor in real one. But as we shall see, we can find the reduced 
forms which correspond to the output of the reduction algorithm applied on 
these forms. 

Suppose that we know a form fre either definite positive or indefinite, which 
is the reduction of a form fp = [q?, kq, (k? + p)/4] where k is an integer. Then 


fe represents the number q?. More precisely, if Mẹ = (° J € SLə(Z) is the 


matrix such that fre = fk-Mk, then fr-M,* = fk and g = fe (1, 0) = field, =y). 
In Section 3} we will see that provided they are relatively small compared to 
Aq, the values 6 and —y can be found in polynomial time with a new variant 
of Coppersmith method. Our factoring algorithm can be sketched as follows: 
find such a form fe and if the coefficients of Mp are sufficiently small, retrieve 
ô and —y and the non-trivial factor q° of 44. In this paragraph, we give some 
heuristics on the size of such a matrix M; and discuss their relevance. If M is a 
matrix we denote by |M| the max norm, i. e., the maximal coefficient of M in 
absolute value. 

In the imaginary case, it is showed in the proof of Theorem 2] that 
the forms fp belong to different classes of the kernel of the map Øq, depending 
on k, so the reduced equivalent forms fz are the unique reduced elements of the 
classes of the kernel. To prove the correctness of our attack on NICE, we need 
the following heuristic (indeed, the root finding algorithm of Section B] recovers 
roots up to |A,|!/°): 


Heuristic 1 (Imaginary case). Given a reduced element fẹ of a nontrivial 
class of ker pq, the matrix of reduction Mp is such that |M;| < JA with 
probability asymptotically close to 1. 


In the full version, we prove a probabilistic version of Heuristic 1. From 
Lemma 5.6.1 of BOJ, |M}| < 2max{q?,(k? + p)/4}/./pq?2. As fp is nor- 
malised, |k| < q and |M,| < 24/ Ð © |Aq|*/°. Note that we cannot reach such a 
bound with our root finding algorithm. Experimentally, for random k, |Mp| can 
be much smaller. For example, if the bit-size À of p and q equals 100, the mean 
value of |M;| is around |A,|!/1!:7. Our heuristic can be explained as follows. 
A well-known heuristic in the reduction of positive definite quadratic forms (or 
equivalently, two-dimensional lattices) is that if [a,b,c] is a reduced quadratic 
form of discriminant A, then a and c should be close to VA. This cannot hold for 
all reduced forms, but it can be proved to hold for an overwhelming majority of 
reduced forms. Applied to fẹ = [a,b,c], this means that we expect a and c to be 
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close to |Aq|!/?. Now, recall that q? = fx(5, —y) = ad? — bdy + cy”, which leads 
to d and y close to \/q?/a = q/Va® q/|Aq|!/4 ~ |A,|!/1?. Thus, we expect that 
|M;| < |A,|!/12. And this explains why we obtained experimentally the bound 
[Aa TET, Figure [I(a)] shows a curve obtained by experimentation, which gives 
the probability that [M;| < |Aq|'/° for random k, in function of A. This curve 
also supports our heuristic. 

In the real case, we prove in the following theorem that Ra, /Ra, forms fk 
are principal and we exhibit the generators of the corresponding primitive ideals. 


Theorem 1. Let Ax be a fundamental positive discriminant, Ay = Ax q where 
q is an odd prime conductor. Let ea, (resp. €a,) be the fundamental unit of 
Oa, (resp. On,) and t such that By =€ea,- Then the principal ideals of Ox, 
generated by gen. correspond to quadratic forms fki) = [q7, k(a)q, (k(i)? — p) /4] 
with i € {1,...,t—1} and k(t) is an integer defined modulo 2q computable from 
ey, mod q. 


Proof. Let a; = qe, with i € {1,...,¢ — 1}. Following the proof of [BTW95 
Proposition 2.9], we detail here the computation of a; = a;O 4: Let zi and yi 
be two integers such that ec), = £i + yiwx. Then a; = qzi + yiqAx(1 — q)/2 + 
yiż(Aq +,/Aq), and a; is an element of O,,. Let m;, a; and b; be three integers 


such that a; = mi (oz + Zeve A), As mentioned in the proof of [BTW95] 


Proposition 2.9], m; is the smallest positive coefficient of J Aq /2 in aj. As Oa, 
is equal to Z + (Ag+ \/Aq)/2Z, aiOa, is generated by a; and a;(Ag + /Ag)/2 
as a Z-module. So a simple calculation gives that m; = gcd (yi, q(@i + yix /2)). 
As ene is not an element of Oa,, we have gcd(y;,q) = 1 so m; = ged(yi, £i + 
yi Ax /2). The same calculation to find mj for the ideal ey Oa, reveals that 
Mi = mi. As ey Oa, = Oa, we must have m! = 1. Now, N(a;) = |N(ai)| = g? 
and N(a;) = m?a; = a; and therefore a; = q°. Let us now find b;. Note that 
bi is defined modulo 2a;. Since a; € ajOa,, there exist u; and v; such that 
Qi = aiui + (—bi + af Aq) /2ri. By identification in the basis (1, Va), vk =1 
and by a multiplication by 2, we obtain 2qzi + qyiâg = —biyi (mod 2a;). As 
bi = A; (mod 2), we only have to determine b; modulo q?. As y; is prime to 
q, we have b; = k(i)q (mod q?) with k(i) = —22;/y; — Ax (mod q). Finally, as 
we must have —a; < b < a; if a; > a/ Ay and else Vå- 2a; < b < VEAN 
k(i) is the unique integer with k(i) = A, (mod 2) and k(i) = —2z;/yi — Ax 
(mod q), such that b = k(i)q satisfies that inequalities. Eventually, the principal 
ideal of O4, generated by ge’, corresponds to the form [q?, k(i)q, ci] with c; = 
(0? — Aq) /(4ai) = (Ki)? — Ax) /4 


From this theorem, we see that if we go across the cycle of principal forms, 
then we will find reduced forms oh To analyse the complexity of our factor- 
ing algorithm, we have to know the distribution of these forms on the cycle. 
An appropriate tool is the Shanks distance d (see Definition 10.1.4]) 
which is close to the number of iterations of Rho between two forms. One has 
d(1a,, fka) = iRA,- From Lemma 10.1.8 of BVO], ld fka, fro )l < log q, for 
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| 235 s 7 100 | 25 50 75 100 25 
(a) Imaginary case (b) Real case 


Fig. 1. Probability that |Mp| < |44|!/? in function of the bit-size À of p and q 


alli = 1,2,...,t — 1. Let j be the smallest integer such that 0 < jRa, — 
2logq, then as jRax = d(fka fects) = A fra fr) + A Facey, fears) + 
A( fri+;), fe(itj)), from the triangle inequality, one has j7Ra, < 2log(q) + 
ld(fascays Fecitsy)|- S0, ld feces fra+y))| > Rar — 2logq > 0. This inequality 
proves that fk) and f,(;4;) do not reduce to the same form. Experiments actu- 
ally show that asymptotically, |d( fa» fka )| is very small on average (smaller 
than 1). As a consequence, as pictured in figure J] d(1a,, fka) ZiRa,. 


Fig. 2. Repartition of the forms Frcs) along the principal cycle 


Moreover, as in the imaginary case, experiments show that asymptotically the 
probability that the norm of the matrices of reduction, |M;,| is smaller than Aa! 
is close to 1 (see figure [I(b)p. This leads to the following heuristic. 


Heuristic 2 (Real case). From the principal form 1y,, a reduced form fe 


such that the matrix of the reduction, My, satisfy |M,| < At’, can be found in 
O(Ra,.) successive applications of Rho. 


We did also some experiments to investigate the case where the bit-sizes of p 
and q are unbalanced. In particular when the size of q grows, the norm of the 
matrix of reduction becomes larger. For example, for a 100-bit p and a 200-bit q 
(resp. a 300-bit q), more than 95% (resp. 90%) of the fk have a matrix Mp with 
|Mz| < Ad/®?® (resp. |My] < Ad/>**). 
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3 A Rigorous Homogeneous Variant of Coppersmith’s 
Root Finding Method 


Our factoring algorithm searches many times for small modular roots of de- 
gree two homogeneous polynomials and the most popular technique to find 
them is based on Coppersmith’s method (see or May’s survey [May07)). 
Our problem is the following: Given f(x,y) = x7 + bry + cy? a (monic) bi- 
nary quadratic form and N = pq? an integer of unknown factorisation, find 
(zo, yo) € Z? such that f(zo,yo) = 0 (mod q?), while |xo|,|yo| < M, where 
M €N. The usual technique for this kind of problems is only heuristic, since it 
is the gcd extension of bivariate congruences. Moreover, precise bounds cannot 
be found in the litterature. Fortunately, because our polynomial is homogeneous, 
we will actually be able to prove the method. This homogenous variant is quite 
similar to the one-variable standard Coppersmith method, but is indeed even 
simpler to describe and more efficient since there is no need to balance coeffi- 
cients. We denote as ||- || the usual Euclidean norm for polynomials. The main 
tool to solve this problem is given by the following variant of the widespread 
elementary Howgrave-Graham’s lemma [How97]. 


Lemma 1. Let B € N and g(x,y) € Z|z,y] be a homogeneous polynomial of 
total degree 6. Let M > 0 be a real number and suppose that ||g(x, y)|| < Sea 
then for all xo, yo E Z such that g(xo, yo) = 0 (mod B) and |zol,|yo| < M, 


g(o, Yo) =0. 


Proof. Let g(x,y) = ys gix'y>—* where some gis might be zero. We have 


ô 4, 6-4 ô 
|g(£0, yo)| < =, lgil|ZOYo |< M° a |gil 
< M°V5+1|lg(a,y)|| < B 


and therefore g(xo, yo) = 0. 


The trick is then to find only one small enough bivariate homogeneous 
polynomial satisfying the conditions of this lemma and to extract the ratio- 
nal root of the corresponding univariate polynomial with standard techniques. 
On the contrary, the original Howgrave-Graham’s lemma suggests to look for 
two polynomials of small norm having (xo, yo) as integral root, and to recover 
it via elimination theory. The usual way to obtain these polynomials is to form 
a lattice spanned by a special family of polynomials, and to use the LLL algo- 
rithm (cf. LLL82]) to obtain the two “small” polynomials. Unfortunately, this 
reduction does not guarantee that these polynomials will be algebraically inde- 
pendent, and the elimination can then lead to a trivial relation. Consequently, 
this bivariate approach is heuristic. Fortunately, for homogeneous polynomials, 
we can take another approach by using Lemma [JJ and then considering a uni- 
variate polynomial with a rational root. This makes the method rigorous and 
slightly simpler since we need a bound on ||g(x, y)|| and not on ||g(aX, yY)|| if 
X and Y are bounds on the roots and therefore the resulting lattice has smaller 
determinant than in the classical bivariate approach. 
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To evaluate the maximum of the bound we can obtain, we need the size of 
the first vector provided by LLL which is given by: 


Lemma 2 (LLL). Let L be a full-rank lattice in Z? spanned by an integer ba- 
sis B = {b1,..., ba}. The LLL algorithm, given B as input, will output in time 
O(d® log? (max ||b;||)) a non-zero vector u € L satisfying ||ul| < 2079/4 det(L) "4. 


We will now prove the following general result regarding the modular roots of 
bivariate homogeneous polynomials which can be of independent interest. 


Theorem 2. Let f(x,y) € Zia, y] be a homogeneous polynomial of degree 6 with 
f(x,0) = 2°, N be a non-zero integer and a be a rational number in [0,1], then 
one can retrieve in polynomial time in log N, 6 and the bit-size of a, all the 
rationals xo/Yyo, where xq and yo are integers such that gcd(f (xo, yo), N) > N° 
and |xo|,|yo| < NCÒ. 


Proof. Let b be a divisor of N for which their exists (xo, yo) € Z? such that 
b = gcd ( f (x0, yo), N) > N°. We define some integral parameters (to be specified 
later) m, t and t with t= m +t and construct a family of ôt +1 homogeneous 
polynomials g and h of degree ôt such that (xo, yo) is a common root modulo 
b™. More precisely, we consider the following polynomials 


gi jlx, y) = riy? -D-i FIN™-* fori =0,...,.m—1,7=0,...,5-1 
hi(z, y) = gyti fm for i = 0;..., 8t. 


We build the triangular matrix L of dimension ôt + 1, containing the coeffi- 
cients of the polynomials g;,; and h;. We will apply LLL to the lattice spanned 
by the rows of L. The columns correspond to the coefficients of the monomials 
y% cyl... 2% -ly, 2. Let 8 € [0,1] such that M = N®. The product of the 
diagonal elements gives det(L) = N°"("+)/?, Tf we omit the quantities that do 
not depend on N, to satisfy the inequality of Lemma[] with the root bound M, 
the LLL bound from Lemma PJimplies that we must have 


dm(m + 1)/2 < (dt + 1)(am — ôt) (3) 


and if we set A such that t = Am, this gives asymptotically @ <  — ar: which 
is maximal when \ = 4, and in this case, Bmax = a7/(26). The vector output 
by LLL gives a homogeneous polynomial f(x,y) such that f(xo, yo) = 0 thanks 
to Lemma [I] Let r = x/y, any rational root of the form zo/yo can be found by 
extracting the rational roots of f/(r) = 1/y*' f(x,y) with classical methods. 


For the case we are most interested in, 6 = 2, N = pq? with p and q of the 
same size, i. e., œ = 2/3 then \ = 3/2 and we can asymptotically get roots up 
to NÊ with 8 = L. If we take m = 4 and t = 6, i. e., we work with a lattice of 
dimension 13, we get from (B) that 8 ~ ee and with a 31-dimensional lattice 
(m = 10 and t = 15), 8 ~ 545. If the size of q grows compared to p, i. e., a 
increases towards 1, then 3 increases towards 1/4. For example, if q is two times 


larger than p, i. e., a= 4/5 then 8 = 1/6.25. For a = 6/7, we get B ~ 1/5.44. 
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We will call HomogeneousCoppersmith the algorithm which implements this 
method. It takes as input an integer N = pq? and a binary quadratic form [a, b, c], 
from which we deduce the unitary polynomial x? +b'zy+c'y?, by dividing both b 
and c by a modulo N, and the parameters m and t. In fact, this method will only 
disclose proper representations of q?, those for which x and y are coprime, but 
we note that fg properly represents q?, and therefore so does our form [a, b, c]. 

The case a = 1 of Theorem 2 can already be found in Joux’s book and 
we mention that a similar technique has already been independently investigated 
by Bernstein in |Ber08}. 


4 A O(p'/?)-Deterministic Factoring Algorithm for pq? 


We detail our new quadratic form-based factoring algorithm for numbers of the 
form pq’. In this section, p and q will be of same bit-size, and p= 1 (mod 4). 


4.1 The Algorithm 


Roughly speaking, if Ay = N = pq’, our factoring algorithm, depicted in Fig. B] 
exploits the fact that the non-reduced forms fp = [g?,kq,—] reduce to forms 
fx for which there exists a small pair (xo, yo) such that q? | fx(20, yo) while 
q? | N. From Theorem [I we know that these reduced forms appear on the 
principal cycle of the class group of discriminant A,. To detect them, we start a 
walk in the principal cycle from the principal form 1y, and apply Rho until the 
Coppersmith-like method finds these small solutions. 


Input: N = pq*,m,t 
Output: p,q 

1. h< 1n 

2. while (xo, yo) not found do 


2.1. h — Rho(h) 

2.2. xo/yo — HomogeneousCoppersmith(h, N, m, t) 
3. q — Sqrt(Gcd(h(z0, yo), N)) 
4. return (N/q’,q) 


Fig. 3. Factoring N = pq? 


4.2 Heuristic Correctness and Analysis of Our Algorithm 


Assuming Heuristic 2] starting from 1y, after O(R,) iterations, the algorithm 
will stop on a reduced form whose roots will be found with our Coppersmith- 
like method (for suitable values of m and t) since they will satisfy the ex- 
pected N1/9 bound. The computation of gced(h(29, yo), N) will therefore expose 
q? and factor N. The time complexity of our algorithm is then heuristically 
O(R,Poly(log N)), whereas the space complexity is O(log N). The worst-case 
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complexity is O(p!/? log p Poly(log N)). For small regulators, such as in REAL- 
NICE cryptosystem (see. Subsection 5.1), the time complexity is polynomial. 

This algorithm can be generalised with a few modifications to primes p such 
that p = 3 (mod 4), by considering Ay = 4pq”. Moreover if the bit-sizes of p and 
q are unbalanced, our experiments suggest that the size of the roots will be small 
enough (see end of Subsection 2.3 and Section 3), so the factoring algorithm will 
also work in this case, with the same complexity. 


Comparison with other Deterministic Factorisation Methods. Boneh, 
Durfee and Howgrave-Graham presented in [BDH99| an algorithm for factoring 
integers N = p'q. Their main result is the following: 


Lemma 3 ({[BDH99]). Let N = p’q be given, and assume q < p° for some c. 
Furthermore, assume that P is an integer satisfying |P — p| < pi te 7a, Then 
the factor p may be computed from N, r, c and P by an algorithm whose running 
time is dominated by the time it takes to run LLL on a lattice of dimension d. 


For r = 2 and c = 1, this leads to a deterministic factoring algorithm which 
consists in exhaustively search for an approximation P of p and to solve the 
polynomial equation (P + X)? =0 (mod p°) with a method à la Coppersmith. 
The approximation will be found after O(p!/*) = O(N”) iterations. 

The fastest deterministic generic integer factorisation algorithm is actually a 
version of Strassen’s algorithm from Bostan, Gaudry and Schost [BGS07], 
who ameliorates a work of Chudnovsky and Chudnovsky and proves a 
complexity of O(Mine( YN log N)) where Mint is a function such that integers of 
bit-size d can me multiplied in Mint(d) bit operations. More precisely, for numbers 
of our interest, Lemma 13 from gives the precise complexity: 


Lemma 4 ([BGS07]). Let b,N be two integers with 2 < b < N. One can 


compute a prime divisor of N bounded by b, or prove that no such divisor ex- 
ists in O (Mine( Vblog N) + log bMint(log N) log log N) bit operations and space 
O(Vblog N) bits. 


In particular, for b = N13, the complexity is Õ(N"®), with a very large space 
complexity compared to our algorithm. Moreover, none of these two last of al- 
gorithms can actually factor an integer of cryptographic size. The fact that a 
prime divisor has a small regulator does not help in these algorithms, whereas 
it makes the factorisation polynomial in our method. 


5 Cryptanalysis of the NICE Cryptosystems 


Hartmann, Paulus and Takagi proposed the elegant NICE encryption scheme 
(see EPTIJPTIJPTO0), based on imaginary quadratic fields and whose main 
feature was a quadratic decryption time. Later on, several other schemes, includ- 
ing (special) signature schemes relying on this framework have been proposed. 
The public key of these NICE cryptosystems contains a discriminant A, = —pq’ 
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together with a reduced ideal } whose class belongs to the kernel of Gy. The 
idea underlying the NICE cryptosystem is to hide the message behind a random 
element [h]” of the kernel. Applying ¢, will make this random element disappear, 
and the message will then be recovered. 

In JSWO8}, Jacobson, Scheidler and Weimer embedded the original NICE 
cryptosystem in real quadratic fields. Whereas the idea remains essentially the 
same as the original, the implementation is very different. The discriminant is 
now A, = pq”, but because of the differences between imaginary and real setting, 
these discriminant will have to be chosen carefully. Among these differences, the 
class numbers are expected to be small with very high probability (see the Cohen- 
Lenstra heuristics [CL84]). Moreover, an equivalence class does not contain a 
unique reduced element anymore, but a multitude of them, whose number is 
governed by the size of the fundamental unit. The rough ideas to understand 
these systems and our new attacks are given in the following. The full description 
of the systems is omitted for lack of space but can be found in [HPT9OJSWOs]. 


5.1 Polynomial-Time Key Recovery in the Real Setting 


The core of the design of the REAL-NICE encryption scheme is the very particular 
choice of the secret prime numbers p and q such that Ax = p and A, = pq’. 
They are chosen such that the ratio Ry, /Ra,x is of order of magnitude of q 
and that Ra, is bounded by a polynomial in log(Ax). To ensure the first 


property, it is sufficient to choose q such that q — (2) is a small multiple of 


a large prime. If the second property is very unlikely to naturally happen since 
the regulator of p is generally of the order of magnitude of \/p, it is indeed 
quite easy to construct fundamental primes with small regulator. The authors 
of suggest to produce a prime p as a so-called Schinzel sleeper, which 
is a positive squarefree integer of the form p = a?a? + 2bx + c with a,b,c, £ 
in Z, a Æ 0 and b? — 4ac dividing 4 gced(a?, b)?. Schinzel sleepers are known to 
have a regulator of the order log(p) (see [CWO5]). Some care must be taken 
when setting the (secret) a,b,c, values, otherwise the resulting Ay = pq? is 
subject to factorisation attacks described in [Wei04]. We do not provide here 
more details on these choices since the crucial property for our attack is the fact 
that the regulator is actually of the order log(p). The public key consists of the 
sole discriminant A,. The message is carefully embedded (and padded) into a 
primitive O,,-ideal so that it will be recognised during decryption. Instead of 
moving the message ideal m to a different equivalence class (like in the imaginary 
case), the encryption actually hides the message in the cycle of reduced ideal 
of its own equivalent class by multiplication of a random principal O,,-ideal b 
(computed during encryption). The decryption process consists then in applying 
the (secret) map Øq and perform an exhaustive search for the padded message in 
the small cycle of Gq((th]). This exhaustive search is actually possible thanks to 
the choice of p which has a very small regulator. Like in the imaginary case, the 
decryption procedure has a quadratic complexity and significantly outperforms 
an RSA decryption for any given security level (see Table 3 from [JSW08]). 
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Unfortunately, due to the particular but necessary choice of the secret prime p, 
the following result states the total insecurity of the REAL-NICE system. 


Result 1. Algorithm recovers the secret key of REAL-NICE in polynomial time 
in the security parameter under Heuristic since the secret fundamental discrim- 
inant p is chosen to have a regulator bounded by a polynomial in log p. 


We apply the cryptanalysis on the following example. The Schinzel polynomial 
S(X) = 2725? X? + 2- 3815X + 2 produces a suitable 256-bit prime p for the 
value Xo = 103042745825387139695432123167592199. This prime has a regula- 
tor Ra, œ 90.83. The second 256-bit prime q is chosen following the recommen- 
dations from [Wei04]. This leads to a the discriminant 


Ag = 287369388233100448733807 161422820733961868439067574632747926387341 440606028305 10 
8073866916348927359259905452944227 1053869832485363682341892124500678400322719842 
63278692833860326257638544601057379571931906787755152745236263303465093 


Our algorithm recovers the prime 


q = 60372105471499634417192859173853663456123015267207769653235558092781188395563 


from A, after 45 iterations in 42.42 seconds on a standard laptop. The rational 


2155511611710996445623 
3544874277134778658948 ? 


E : log(Aq) n 
root is J equal to — where xo and yo satisfy elz = 10.8 
and 284a) ~ 10.7. 

log ([yol) 


5.2 Polynomial-Time Key Recovery of the Original NICE 


As mentioned above, the public key of the original NICE cryptosystem contains 
the representation of a reduced ideal § whose class belongs to the kernel of the 
surjection Øq. The total-break of the NICE cryptosystem is equivalent to solving 
the following kernel problem. 


Definition 6 (Kernel Problem [BPT04]). Let À be an integer, p and q be 
two A-bit primes with p = 3 (mod 4). Fiz a non-fundamental discriminant Aq = 
—pq’. Given an element [b] of ker pq, factor the discriminant Ag. 


Castagnos and Laguillaumie proposed in a polynomial-time algorithm to 
solve this problem. We propose here a completely different solution within the 
spirit of our factorisation method and whose complexity is also polynomial-time. 
As discuss in Subsection 2.3, the idea is to benefit from the fact that the public 
ideal h corresponds to a reduced quadratic form, fi, which represents q?. We thus 
find these zo and yo such that ecd(fir(ro, yo), Ag) = @ with the Coppersmith 
method of Section B] 


Result 2. The Homogeneous Coppersmith method from Section E] solves the 
Kernel Problem in polynomial time in the security parameter under Heuristic] 
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We apply our key recovery on the example of NICE proposed in [JJOOCLO9: 


Ag = —1001133619402846750073919037082619174565372425946674915149340539464219927955168 
18216760083640752198709726199732701843864411853249644535365728802022498185665592 
98370854645328210791277591425676291349013221520022224671621236001656120923 


a = 5702268770894258318168588438117558871300783180769995195092715895755173700399 
141486895731384747 


b = 3361236040582754784958586298017949110648731745605930164666819569606755029773 
074415823039847007 


The public key consists in A, and } = (a,b). Our Coppersmith method finds 
in less that half a second the root uo = S224 = zo and 


h(xo, yo) = 536312317197703883982960999928233845099174632823695735 1089 
4245774887056120365979002534633233830227721465513935614971 
593907712680952249981870640736401120729 = q. 


All our experiments have been run on a standard laptop under Linux with 
software Sage. The lattice reduction have been performed with Stehlé’s fplll [Ste]. 
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Abstract. We look at iterated power generators si = s[_; mod N for a 
random seed so € Zy that in each iteration output a certain amount of 
bits. We show that heuristically an output of (1 — +) log N most signifi- 
cant bits per iteration allows for efficient recovery of the whole sequence. 
This means in particular that the Blum-Blum-Shub generator should be 
used with an output of less than half of the bits per iteration and the 
RSA generator with e = 3 with less than a -fraction of the bits. 

Our method is lattice-based and introduces a new technique, which 
combines the benefits of two techniques, namely the method of lineariza- 
tion and the method of Coppersmith for finding small roots of polynomial 
equations. We call this new technique unravelled linearization. 


Keywords: power generator, lattices, small roots, systems of equations. 


1 Introduction 


Pseudorandom number generators (PRGs) play a crucial role in cryptography. 
An especially simple construction is provided by iterating the RSA function 
si = s&_, mod N for an RSA modulus N = pq of bit-size n and a seed so € Zy. 
This so-called power generator outputs in each iteration a certain amount of 
bits of s;, usually the least significant bits. In order to minimize the amount of 
computation per iteration, one typically uses small e such as e = 3. With slight 
modifications one can choose e = 2 as well when replacing the iteration function 
by the so-called absolute Rabin function BH], where s? mod N is defined to be 
min{s? mod N,N — s? mod N}, N is a Blum integer and so is chosen from 
{0,..., 4+} with Jacobi symbol +1. 

It is well-known that under the RSA assumption one can safely output up 
to O(logn) = O(loglog N) bits per iteration [IS]. At Asiacrypt 2006, Stein- 
feld, Pieprzyk and Wang [J showed that under a stronger assumption regard- 
ing the optimality of some well-studied lattice attacks, one can securely output 
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($ — 4—e-o(1))n bits. The assumption is based on a specific RSA one-wayness 
problem, where one is given an RSA ciphertext c = m° mod N together with 
a certain fraction of the plaintext bits of m, and one has to recover the whole 
plaintext m. We call this generator the SPW generator. The SPW generator has 
the desirable property that one can output a constant fraction Q(log N) of all 
bits per iteration. Using an even stronger assumption, Steinfeld, Pieprzyk and 
Wang could improve the output size to (4 —  — e — o(1))n bits. 

A natural question is whether the amount of output bits of the SPW gener- 
ator is maximal. Steinfeld et al.’s security proof uses in a black-box fashion the 
security proof of Fischlin and Schnorr for RSA bits [8]. This proof unfortunately 
introduces a factor of 4 for the output rate of the generator. So, Steinfeld et 
al. conjecture that one might improve the rate to (1 — 4 — e)n using a different 
proof technique. Here, e is a security parameter and has to be chosen such that 
performing 2°” operations is infeasible. We show that this bound is essentially 
the best that one can hope for by giving an attack up to the bound (1 — +)n. 

In previous cryptanalytic approaches, upper bounds for the number of output 
bits have been studied by Blackburn, Gomez-Perez, Gutierrez and Shparlin- 
ski [2]. For e = 2 and a class of PRGs similar to power generators (but with 
prime moduli), they showed that provably =n bits are sufficient to recover the 
secret seed sọ. As mentioned in Steinfeld et al., this bound can be generalized 
to (1— =7)n using the heuristic extension of Coppersmith’s method [7] to mul- 
tivariate equations. 


Our contribution: We improve the cryptanalytic bound to (1— i)n bits using a 
new heuristic lattice-based technique. Notice that the two most interesting cases 
are e = 2,3, the Blum-Blum-Shub generator and the RSA generator. For these 
cases, we improve on the best known attack bounds from en to in and from 3n 
to =n, respectively. Unfortunately — similar to the result of Blackburn et al. 
— our results are restricted to power generators that output most significant 
bits in each iteration. It remains an open problem to show that the bounds hold 
for least significant bits as well. 

Our improvement comes from a new technique called unravelled linearization, 
which is a hybrid of lattice-based linearization (see for an overview) and 
the lattice-based technique due to Coppersmith [7]. Let us illustrate this new 
technique with a simple example. Assume we want to solve a polynomial equation 
xr? +ax+b = ymod N for some given a,b € Zy and some unknowns 2, y. 
This problem can be considered as finding the modular roots of a univariate 
polynomial f(x) = x? + az + b with some error y. 

It is a well-known heuristic that a linear modular equation can be easily solved 
by computing a shortest lattice vector, provided that the absolute value of the 
product of the unknowns is smaller than the modulus [I3]. In order to linearize 
our equation, we substitute u := x? and end up with a linear equation in u, £, y. 
This can be solved whenever |uxy| < N. If we assume for simplicity that the 
unknowns 2, y are of the same size, this yields the condition |x| < N3. 

However, in the above case it is easy to see that this linearization is not 
optimal. A better linearization would define u := z? — y, leaving us with a linear 
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equation in u, x only. This yields the superior condition |x| < N3. So one benefits 
from the fact that one can easily glue variables together, in our case x? and y, 
whenever this does not change the size of the larger variable. In our example 
this would also work when y had a known coefficient c of size |c| ~ |y]. 

The main benefit from the attack of Blackburn et al. [2] comes from a clever 
linearization of the variables that occur in the case of power generators. While 
on the one hand such a linearization of a polynomial equation offers some ad- 
vantages, on the other hand we lose the algebraic structure. Performing e.g. the 
substitution u := x°, one obtains a linear equation in u, x,y but the property 
that u and x are algebraically dependent — one being the square of the other 
— is completely lost. Naturally, this drawback becomes more dramatic when 
looking at higher degree polynomials. 

As a consequence, Coppersmith designed in 1996 a lattice-based method 
that is well-suited for exploiting polynomial structures. The underlying idea is to 
additionally use algebraic relations before linearization. Let us illustrate this idea 
with our example polynomial f(x, y) = x? + ax+b—y. We know that whenever f 
has a small root modulo N, then also z f = 2°+ax?+bxr—«y shares this root. Using 
xf as well, we obtain two modular equations in five unknowns 2°, x”, x, y, zy. No- 
tice that the unknowns z? and z are re-used in the second equation which reflects 
the algebraic structure. So even after linearizing both equations, Coppersmith’s 
method preserves some polynomial structure. In addition to multiplication of f 
by powers of x and y — which is often called shifting in the literature — one also 
allows for powers ft with the additional benefit of obtaining equations modulo 
larger moduli N*. 

When we compute the enabling condition with Coppersmith’s method for 
our example f(x,y) using an optimal shifting and powering, we obtain a bound 
of |x| < N3. So the method yields a better bound than naive linearization, 
but cannot beat the bound of the more clever linearization with u := z? — y. 
Even worse, Coppersmith’s method results in the use of lattices of much larger 
dimension. 

To summarize, linearization makes use of the similarity of coefficients in a 
polynomial equation, whereas Coppersmith’s method basically makes use of the 
structure of the polynomial’s monomial set. 


Motivation for unravelled linearization: Our new technique of unravelled 
linearization aims to bring together the best of both worlds. Namely, we al- 
low for clever linearization but still exploit the polynomial structure. Unravelled 
linearization proceeds in three steps: linearization, basis construction, and un- 
ravellation. Let us illustrate these steps with our example f(x,y), where we use 
the linearization u := x? — y in the first step. In this case, we end up with a 
linear polynomial g(u, x). Similar to Coppersmith’s approach, in the second step 
we use shifts and powers of this polynomial. E.g., g? defines an equation in the 
unknowns u?, uz, z?,u,x modulo N?. But since we start with a linear polyno- 
mial g, this alone will not bring us any benefits, because the algebraic structure 
got lost in the linearization process from f to g. 
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Therefore, in the third step we partially unravel the linearization for g? using 
the relation x? = y +u. The unravelled form of g? defines a modular equation in 
the unknowns u?, uz, y,u, x, where we basically substitute the unknown x? by 
the unknown y. Notice here, that we can reuse the variable u which occurs in g? 
anyway. This substitution leads to a significant gain, since y is much smaller in 
size than 2?. 

In the present paper, we elaborate on this simple observation that unravelling 
of linearization brings benefits to lattice reduction algorithms. We use the equa- 
tions that result from the power generator as a case study for demonstrating the 
power of unravelled linearization, but we are confident that our new technique 
will also find new applications in various other contexts. 

The paper is organized as follows. In Section B] we will fix some very basic 
notions for lattices. In Section [8] we define our polynomials from the power 
generator with e = 2 and give a toy example with only two PRG iterations 
that illustrates how unravelled linearization works. This already leads to an 
improved bound of tn. In Section] we generalize to arbitrary lattice dimension 
(bound n) and in Section Ø] we generalize to an arbitrary number of PRG 
iterations (bound $n). In Section lwe finally generalize to an arbitrary exponent 
e. Since our attacks rely on Coppersmith-type heuristics, we verify the heuristics 
experimentally in Section [J 


2 Basics on Lattices 


Let by,...,bq € Q? be linearly independent. Then the set 


d 
L:= [xeotix=Yamaez] 


i=1 


is called a lattice L with basis matrix B € Q 74, having the vectors b1,..., bg as 
row vectors. The parameter d is called the lattice dimension, denoted by dim(Z). 
The determinant of the lattice is defined as det(L) := | det(B)]. 

The famous LLL algorithm [I0] computes a basis consisting of short and pair- 
wise almost orthogonal vectors. Let v1,...,Vq be an LLL-reduced lattice basis 
with Gram-Schmidt orthogonalized vectors vj,..., vý. Intuitively, the property 
of pairwise almost orthogonal vectors v1,...,Va implies that the norm of the 
Gram-Schmidt vectors vj,...,Vvg cannot be too small. This is quantified in the 
following theorem of Jutla [9] that follows from the LLL paper [IQ]. 


Theorem 1 (LLL). Let L be a lattice spanned by B € Q?**. On input B, the 
L°-algorithm outputs an LLL-reduced lattice basis {v1,...,Va} with 


1s — 


I 
bmas 


Ivil > 2 


L 
) Jort= lead 


in time polynomial in d and in the bit-size of the largest entry bmax of the basis 
matriz B. 
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3 Power Generators with e = 2 and Two Iterations 


Let us consider power generators defined by the recurrence sequence 
Si = s§_, mod N, 


where N is an RSA modulus and sọ € Zy is the secret seed. 

Suppose that the power generator outputs in each iteration the most signifi- 
cant bits k; of si, i.e. s; = ki + xi, where the k; are known for i > 1 and the 2; 
are unknown. 

Our goal is to recover all x; for a number of output bits k; that is as small 
as possible. In other word, if we define x; < N® then we have to find an attack 
that maximizes ô. 

Let us start with the most simple case of two iterations and e = 2. The best 
known bound is 6 = + due to Blackburn et al. B]. We will later generalize to an 
arbitrary number of iterations and also to an arbitrary e. 

For the case of two iterations, we obtain 


sı = kı +a, and sg=ko+ 22, 


for some unknown s;, £i. The recurrence relation of the generator s2 = s? mod N 
yields k2 + z2 = (kı + z1)? mod N, which results in the polynomial equation 


r? — r2 + 2kı rı +k? — kə = 0 mod N. 
~” —— 


a b 


Thus, we search for small modular roots of f(x1, 22) = x? — z2 + axı +b modulo 
N. 

Let us first illustrate our new technique called unravelled linearization with a 
small-dimensional lattice attack before we apply it in full generality in Section] 
Step 1: Linearize f(#1, x2) into g. 

We make the substitution u := z? — x2. This leaves us with a linear polynomial 
glu, x1) = u + axı +b. 


Step 2: Basis construction. 
Defining standard shifts and powers for g is especially simple, since g is a linear 
polynomial. If we fix a total degree bound of m = 2, then we choose g, xg and g?. 

Let X := N° be an upper bound for x1, £2. Then U := N? is an upper bound 
for u. The choice of the shift polynomials results in a lattice L spanned by the 
rows of the lattice basis B depicted in Figure [] 

Let (uo, £o) be a root of g. Then the vector v = (1, £o, £, Uo, Uozo, ug, 
kı, k2, k3)B has its right-hand three last coordinates equal to 0 for suitably cho- 
sen ki € Z. Hence we can write v as v = (1, $,..., 2,0,0,0). Since |uo| < U 
and |2o| < X, we obtain |v] < V6. 

To summarize, we are looking for a short vector v in the 6-dimensional sublat- 
tice L’ = L N (Q® x03) with |v] < \/dim(ZL’). Let by,..., bg be an LLL-reduced 
basis of L’ with orthogonalized basis b7,...,bg. Coppersmith [f] showed that 
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Fig. 1. After linearization and standard shifts and powers for m = 2 


any vector v € L’ that is smaller than bg must lie in the sub-space spanned 
by bi,...,bs, ie. v is orthogonal to bg. This immediately yields a coefficient 
vector of a polynomial h(u, xı), which has the same roots as g(u, x1), but over 
the integers instead of modulo N. Assume that we can find two such polynomials 
hı, h2, then we can compute all small roots by resultant computation provided 
that hı, h2 do not share a common divisor. The only heuristic of our method is 
that the polynomials h1, hz are indeed coprime. 

By the LLL-Theorem (Theorem []), an orthogonalized LLL-basis contains a 
vector bä in Z with |bġ| > c(d)det(L’)3, where c(d) = 277". Thus, if the 
condition 

c(d) det (L) 7 > Vd 
holds, then ¥ = (1, $,..., 2) will be orthogonal to the vector b4. 

Since det(L’) is a function of N, we can neglect d = dim(L’) for large enough 

N. This in turn simplifies our condition to 


det(L’) > 1. 


Moreover, one can show by a unimodular transformation of B that det(L’) = 
det (LZ). 

For our example, the enabling condition det(L) > 1 translates to U+X* < N4. 
Plugging in the values of X := N° and U := N°, this leads to the condition 
ô< E, Notice that this is exactly the condition from Blackburn et al. B]. Namely, 
if the PRG outputs on bits per iteration, then the remaining an bits can be found 
in polynomial time. 

We will now improve on this result by unravelling the linearization of g. 


Step 3: Unravel g’s linearization. 
We unravel the linearization by back-substitution of x? = u + x2. This slightly 
changes our lattice basis (see Fig. B). 

The main difference is that the determinant of the new lattice L,, increases 
by a factor of X. Thus our enabling condition det(L,,) > 1 yields U4X? < N4 
or equivalently 6 < 4 This means that if the PRG outputs {n of the bits in 
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Fig. 2. After unravelling the linearization 


each of two iterations, then we can reconstruct the remaining an bits of both 
iterations in polynomial time. This beats the previous bound of in. 

We would like to stress again that our approach is heuristic. We construct 
two polynomials hı, hol The polynomials h1, ha contain a priori three variables 
£1, £2,U, but substituting u by x? — xə results in two bivariate polynomials 

1hb. Then, we hope that h} and hh are coprime and thus allow for efficient 


root finding. We verified this heuristic with experiments in Section [A 


4 Generalization to Lattices of Arbitrary Dimension 


The linearization step from f (a1, £2) to g(u, zı) is done as in the previous section 
using u := x? — x2. For the basis construction step, we fix an integer m and define 


the following collection of polynomials 
Gi,g(U, £1) = a} g'(u, x1) fori=1,...,mandj=0,...,m—i. (1) 


In the unravelling step, we substitute each occurrence of x? by u + z and 
change the lattice basis accordingly. It remains to compute the determinant of 
the resulting lattice. This appears to be a non-trivial task due to the various 
back-substitutions. Therefore, we did not compute the lattice determinant as a 
function of m by hand. Instead, we developed an automated process that might 
be useful in other contexts as well. 

We observe that the determinant can be calculated by knowing first the prod- 
uct of all monomials that appear in the collection of the g;,; after unravelling, 
and second the product of all N. Let us start with the product of the N, since 
it is easy to compute from Equation (): 


itii 


1 The polynomial hz can be constructed from b4_, with a slightly more restrictive 
condition on det(Z) coming from Theorem []] However, in practical experiments 
the simpler condition det(L) > 1 seems to suffice for h2 as well. In the subsequent 
chapters, this minor detail is captured by the asymptotic analysis. 


pami- i? = Nom’ +o(m?) l 


H 
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Now let us bound the product of all monomials. Each variable 71, 72, u appears 
in the unravelled form of gij with power at most 2m. Therefore, the product of 
all monomials that appear in all 4m? + 0(m?) polynomials has in each variable 
degree at most m°. Thus, we can express the exponent of each variable as a 
polynomial function in m of degree 3 with rational coefficients — similar to the 
exponent of N. 

But since we know that the exponents are polynomials in m of degree at 
most 3, we can uniquely determine them by a polynomial interpolation at 4 
points. Namely, we explicitly compute the unravelled basis for m = 1,...,4 and 
count the number of variables that occur in the unravelled forms of the gij. 
From these values, we interpolate the polynomial function for arbitrary m. 

This technique is much less error-prone than computing the determinant func- 
tions by hand and it allows for analyzing very complicated lattice basis struc- 
tures. Applying this interpolation process to our unravelled lattice basis, we 
obtain det(L) = X~P10™ Y—P2(™) yPs(m) with 

1 


pi(m) = Em? + om’), palm) = Sm? + o(m), ps(m) = Em? + o(m®), 


Our condition det(L) > 1 thus translates into 36 < 2 resp. ô < 2. Interestingly, 
this is exactly the bound that Blackburn et al. [J] conjectured to be the best 
possible bound one can obtain by looking at two iterations of the PRG. 

In the next section, we will also generalize our result to an arbitrary fixed 
number of iterations of the PRG. This should intuitively help to further improve 
the bounds and this intuition turns out to be true. To the best of our knowledge, 
our attack is the first one that is capable of exploiting more than two equations 
in the contexts of PRGs. 


5 Using an Arbitrary Fixed Number of PRG Iterations 


We illustrate the basic idea of generalizing to more iterations by using three 
iterations of the generator before analyzing the general case. 

Let s; = ki + x; for i = 1,2,3, where the k; are the output bits and the 2; are 
unknown. For these values, we are able to use two iterations of the recurrence 
relation, namely 

s2 = s? mod N s3 = s3 mod N 


from which we derive two polynomials 


fi : at — T2 + 2kı xı + k? — k2 = 0 mod N 
D a —— 
UL ay by 


fo: £2 — £3 + 2kı tq + k2 — k3 = 0 mod N. 
— a N~ —— 


u2 a2 be 


We perform the linearization step fı —> gı and f2 — g2 by using the substitutions 
uj i= x? — zə and ug := z2 — T3. 
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T2 a1 a? a2 bo agbe bı az2bı 
£3 a2 ar 
ur 1 ar a+b b2 
uixı T ay 
u? 1 
u2 1 a2 a2 + bo bı 
U2T2 1 a2 
u2 1 
T1ıT2 a1 a2 a1a2z 
UiT 1 a2 
U22L1 i ay 
U1 U2 1 
N 
N 
N?2 
N 
N 
N? 
N 
N 
N? 


Fig. 3. Generic lattice basis for 2 polynomials 


In the basis construction step, we have to define a collection for the polyno- 
mials gı (u1, 21) and g2(u2, £2) using suitable shifts and powers. We will start by 
doing this in some generic but non-optimal way, which is depicted in Figure B] 
for the case of fixed total degree m = 2 in g1, g2. In this basis matrix for better 
readability we leave out the left-hand diagonal consisting of the inverses of the 
upper bounds of the corresponding monomials. 

The reader may verify that the bound obtained from this collection of polyno- 
mials is 6 < + = 0.364, which is exactly the same bound as in our starting exam- 
ple in Section] A bit surprisingly, our generic lattice basis construction does not 
immediately improve on the bound that we derived from a single polynomial. 

It turns out, however, that we improve when taking just a small subset of the 
collection in Fig. B] If we only use the shifts 91,7191, g? and additionally go, then 
we obtain a superior bound of 6 < 7 zx 0.385. The reason for the improvement 
comes from the fact that the monomial x2 of g2 can be reused as it already 
appeared in the shifts x1g; and g?. 

For the asymptotic analysis, we define the following collection of polynomials 


Ji, j,k = ak gi gi for ¢7=0,... m= | withi+j>1. 
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The intuition behind the definition of this collection of polynomials follows the 
same reasoning as in the example for m = 2. We wish to keep small the number 
of new monomials introduced by the shifts with g2. Notice that the monomials z$ 
for i =0,... | = | already appeared in the gı shifts — since we back-substituted 
x? > ui + £2. Therefore, it is advantageous to use the go shifts only up to | =| : 

With the interpolation technique introduced in Section H we derive a bound 


of < É for the case of 2 polynomials, i.e. three output values of the generator. 


5.1 Arbitrary Number of PRG Iterations 


Given n + 1 iterations of the PRG, we select a collection of shift polynomials 
following the intuition given in the previous section: 


w— mÉ nil in 
Git ,..singk 5 Tigi +++ In 


for 2: with ij +...+¢, > 1. 
k =0,.. m- E? ty 


To perform the asymptotic analysis we need to determine the value of the de- 
terminant of the corresponding lattice basis. This means, we have to count the 
exponents of all occurring monomials in the set of shift polynomials. We would 
like to point out that because of the range of the index k, the shifts with x 
do not introduce additional monomials over the set defined by the product of 
the g; alone. For this product the monomials can be enumerated as follows (see 
Appendix [A] for a proof): 


ay An ,,41—-@1 ta—1—@n—-1, in=2bn=āän mbn 
T Ps D pee Up amen T 
ay =Q m ay =0,1 


ig =0,..., |[454| az =0,1 


with 
#=i aj=i; 
may ji 24; 
sy a an =0,1 


bn =0,..., [izt]. 


We are only interested in the asymptotic behavior, i.e. we just consider the 
highest power of m. We omit the floor function as it only influences a lower 
order term. Analogously, we simplify the exponents of u; by omitting the value 
aj, since it is a constant polynomial in m. Furthermore, for the same reason the 
contribution to the determinant of all x; with 7 < n can be neglected. 
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To derive the final condition, we have to compute the polynomials p;(m) of 
the following expression for the determinant (resp. the coefficients of the highest 
power of m): 

det (L) = X gya) i U— Palm) yPN (m) 
n 
It seems to be a complicated task to compute these polynomials explicitly. There- 
fore, we follow a different approach and compute the sizes of their leading coeffi- 
cients in relation to each other. This turns out to be enough to derive a bound on 
the sizes of the unknowns. In Appendix B] we explain how to derive the following 
expressions for the polynomials: 


1 f 1 2”—1 
pj(m) = zP: (m) for j <n, pxr(m) = —pi(m), pn(m) = jac Pulm), 
where we again omit low order terms. We use these expressions in the enabling 
condition det(L) > 1 and plug in upper bounds X41 < N? and U; < N?°. It is 
sufficient to consider the condition for the exponents: 


1 iL 2” —1 
Jon Pilm) + 255° —pi(m) < “pnt Pulm). 


gnrl LD 


6 < —— 
= Qn+2 _ 3? 


which converges for n — co to ô < 3. 


6 Extending to Higher Powers 


In the previous sections, we have considered PRGs with exponent e = 2 only, 
i.e. a squaring operation in the recurrence relation. A generalization to arbitrary 
exponents is straight forward. 

Suppose the PRG has the recurrence relation sg = s{ mod N. Let, as in 
Section B] the output of the generator be k1, k2, i.e. we have sı = kı + 21 and 
S2 = kg + £2, for some unknown si, £i. 

Using the recurrence relation, this yields the polynomial equation 


Li — T2 teks" +...+ eke tay + ky — k2 = 0 mod N. 
= ——’ 
u b 


The linearization step is analog to the case where e = 2, however, the unravelling 
of the linearization only applies for higher powers of «1, in this case xf. 
The collection of shift polynomials using n PRG iterations is 


-— pk ptt i 
Jir,nink = TIIT -In 
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for 2: with i+... +i >1. 


= T =l 
k =0,... m- j l; 


Taking a closer look at the analysis in Appendix [A]and [B]shows that the general- 
ization for arbitrary e is straightforward. Working through the analysis we obtain 
for arbitrary e an asymptotic bound for an arbitrary number of polynomials of 
ô< h, 

= "6 


7 Experiments 


Since our technique uses a heuristic concerning the algebraic independence of the 
obtained polynomials, we have to experimentally verify our results. Therefore, 
we implemented the unravelled linearization using SAGE 3.4.1. including the L? 
reduction algorithm from Nguyen and Stehlé [2]. In Table [some experimental 
results are given for a PRG with e = 2 and 256 bit modulus N. 


Table 1. Experimental Results for e = 2 


paisa 2 Jes aim me 
410.377] 0. 364 
6 0.383] 0.377 
8 [0.387] 0.379 


4 |0.405] 0.390 
6 |0.418] 0.408 
4 |0.407| 0.400 


In the first column we denote the number of polynomials. The second column 
shows the chosen parameter m, which has a direct influence on how close we 
approach the asymptotic bound. On the other hand, the parameter m increases 
the lattice dimension and therefore the time required to compute a solution. The 
theoretically expected 6 is given in the third column, whereas the actually verified 
6 is given in the fourth column. The last column denotes the time required to 
find the solution on a Core2 Duo 2.2 GHz running Linux 2.6.24. 

It is worth mentioning that most of the time to find the solution is not spend 
on doing the lattice reduction, but for extracting the common root from the 
set of polynomials using resultant computations. The resultant computations 
yielded the desired solutions of the power generators. 
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A Describing the Set of Monomials 


Theorem 1 Suppose we have n polynomials of the form 


filtz; Ti+1) = r? aiti + bi — Ti+ 


and define the collection of polynomials 
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After performing the substitutions x? > u;+2xi41, the set of all occurring mono- 
mials can be described as 


ay Qn,,t1—@1 in-1—-@Gn-1 in —2bn—An nOn 
Ti -Ln Uy 2+) Un—1 Un Tn+1 
ay, = Qa m &= 0,1 


ig =0,..., | 257] a2=0,1 
with 3 : : 


in =0,... 


m- Pr} 29-14; 
, — an =0,1 


bn =0,..., [25a] . 


Proof. By induction: Basic step: n = 1 
For one polynomial fı(x1, £2) = £? +a 1271 +61 — x2 we perform the substitution 
x£? ++ u1 + x2 to obtain gi(u1,271) = u1 + 4121 + bı. The set of all monomials 


that are introduced by the powers of gı(u1, £1) can be described as 


ee 41 =O0,...,m 
tay ” fos. aan 
Ji = eti: 


It remains to perform the substitution on this set. Therefore, we express the 
counter jı by two counters a, and bı and let jı = 2b; + aj, i.e. we write the set 
as 

t1 = ERE A 

(7) giui for <a =0,1 

w Siea lia: 
Imagine that we enumerate the monomials for fixed i1,aı and increasing bı, 
and simultaneously perform the substitution x? > u1 + £2. The key point to 
notice is that all monomials that occur after the substitution, i.e. all of (uy + 
rg) a utt? have been enumerated by a previous value of b, except for 
the single monomial a3! aft uj!~21—-™, 
Thus, the set of monomials after the substitution can be expressed as 


ir =0,...,m 
bi „a1, t1 —2b1—a1 r 
To T1 Uy for {ay =0,1 
= i1—aı 
by 0, $ | 2 | 


This concludes the basic step. 


Inductive Step: n—1—n 
Suppose the assumption is correct for n — 1 polynomials. By the construction of 
the shift polynomials and the induction hypothesis, we have the set of monomials 


Qn-1, i1—@1 in-2—-GQn—2, tn—1—2bn-1—-On-1, bn_1 
u me 


äi Jnn in-i 
Ti +++ Ly _y Uy see in —2 n—1 n Tn Un i 


Hypothesis fn 
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71 =0,...,m a, =0,1 

, m—i 

iz =0,..., |75] az = 0,1 

n—29j-1,. 
5 m— i 2 tj 
for in-1 =0,..., —== an-ı = 0,1 

tn —1—An— 

bn-1 = 0, n—1 n—1 


By adding the n-th polynomial, we also get the new relation z2 = tin ase 
Before performing the substitutions, however, we have to take a closer look at 
the powers of zn. The problem seems to be that we have a contribution from 
the n-th polynomial as well as from some previous substitutions. It turns out 
that this can be handled quite elegantly. Namely, we will show that all occurring 
monomials are enumerated by just taking bn—1 = 0. 

Consider the set of monomials for bn—1 = c for some constant c > 1: 


ay in —1—2C—Gn-1 
La ta Ube 


rinte for jn € {0,220 in}. 
Exactly the same set of monomials is obtained by considering the index i41 = 


in—1 — 2 and bn—1ı = c— 1. Notice that in this case the counter i/,, which serves 
as an upper bound of j/,, runs from 0 through 


m- Spain mina | _ |m- DIP — eta 
Qn-1 _ 2n—1 
= in +1. 
Thus, we have the same set of monomials as with bn—1 = c — 1: 


ri" ctype RN tnt pia teI) foy In E Ogin 
Iterating this argument, we conclude that all monomials are enumerated by 
bn—1 = 0. 

Having combined the occurring powers of £n, we continue by performing an 
analog step as in the basic step: introduce a, and bn representing jn. This leads 
to 


ay Im—1—4n-1/,,2\bn „än, În—2bn—a 
Ti n—1 ( a) "Ln Un " ý 
ti =0,...,m a, = 0,1 
m—i 
igz=0, el =| az = 0,1 


for < . mapa 214; 
ina =0,..., | =P 8) ayy =0,1 
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Finally we substitute z2 = Un + £n+1. Using the same argument as in the basic 
step, we note that new monomials only appear for powers of £n+1.- 


B Relations among Exponent Polynomials 


For the determinant computation we need to sum up the exponents of the oc- 
curring monomials. Take for example ug with £ < n: using the description of the 
set from Appendix [A] we need to compute 


_yn-1oj-1; 
my Pry 2 his 


mai >, tn sin | 
m 1 1| L L 
~ ~ ia ~ 
So D i (ie — au) 
i1=0 a1=0 i2=0 a2=0 in =0 Qn =0 


We will step by step simplify this expression using the fact that in the asymptotic 
consideration only the highest power of the parameter m is important. 

In the first step we notice that we may remove the —a,¢ from the summation, 
because ae does not depend on m, while z¢ does. Therefore, the ag just affects 
lower order terms. With the same argument we can omit the a, in the upper 
bound of the sum over bp. Further, the floor function in the limit of the sums 
does only affect lower order terms and therefore may be omitted. Next, we can 
move all the sums of the a; to the front, since they are no longer referenced 
anywhere, and replace each of these sums by a factor of 2, making altogether a 
global factor of 2”. 

For further simplification of the expression, we wish to eliminate the fractions 
that appear in the bounds of the sums. To give an idea how to achieve this, 
consider the expression 


m “st 
oe 
i1=0 i2=0 
Our intuition is to imagine an index i) of the second sum that performs steps 
with a width of 2 and is upper bounded by m — i1. To keep it equivalent, we 
have to compute the sum of over all integers of the form [4]. However, when 


changing the index to 74, the sum surely does not perform steps with width 2. 
Le. we count every value exactly twice. Thus, to obtain a correct reformulation, 
we have to divide the result by 2. Note that asymptotically we may omit the 


floor function and simply sum over 2. 
In the same way we are able to reformulate all sums from 7; to in. For better 


readability we replaced i} with ij again. 
m—ii 


11 m omi mM Ejay R 
ee pe at 
i1=0 ig= in= n=0 


It seems to be a complicated task to explicitly evaluate a sum of this form. There- 
fore, we follow a different approach, namely we relate the sums over different i 
to each other. We start with the discussion of a slightly simpler observation: 


(2) 
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n=l 
Sums of the form }77"_ Dns oe PME i ip are equal for all £ < n. 

An explanation can be given as follows. Imagine the geometric object that is 
represented by taking the 7; as coordinates in an n-dimensional space. This set 
describes an n-dimensional simplex, e.g. a triangle for n = 2, a tetrahedron for 
n = 3, etc. Considering its regular structure, i.e. the symmetry in the different 
coordinates, it should be clear that the summation over each of the 7g results in 
the same value. 

In the sum of Equation (2) there is an additional inner summation with index 
bn and limit in/2”. For the indices £ < n this innermost sum is constant for all 
values of @ and thus with the previous argumentation the whole sums are equal 
for all £ < n. We only have to take care of the leading factors, i.e. the powers of 
2 that came from replacing the summation variables. 

This gives us already a large amount of the exponent polynomials in the 
determinant expression. Namely, we are able to formulate the polynomials pg 
(which is the sum over the ig) in terms of pı for all £ < n. The difference is 
exactly the factor z= that has been introduced when changing the index from 
(7 to iy. 

For the exponent polynomial of the variable un, however, we have to be careful 
because we do not compute the summation of in — an, but of in/ Q°-1_ 9b, a; 
instead (i, /2"~! since we changed the summation index in). The value —a, can 
be omitted with the same argument as before. To derive a relation of pn to pı, 
we start by evaluating the inner sums: 


= 
M 
© 
II 
ME 
T 
È 


= at 2 jot 49 2 Li 
Pica A © (a-h) = D (=) 


Notice that once again, for the asymptotic analysis we have only considered the 
highest powers. 

Because of the previously mentioned symmetry between 7; and in, we finally 
derive pn = s4rp1. The same argument can be used to derive the bound on the 
variable x41 for which we have to compute the sum 


n—-1 n—-1.- 
m= De i tj in my j= ti g 
5 X X tn 
in=0 bn =0 in=0 


The multiplicative relation between pı and p, is therefore ps = =p. 
Finally, to compute the exponent of N in the determinant, we have to sum 
up all exponents that occur in the enumeration of the shift polynomials given in 
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Section $I] The simplifications are equivalent to the ones used before and we 
obtain: 


n 


11 i ee ae 
p=) A oa 2 gr" 


f= iL =0 in =O k=0 


We first note that for £ < n we may write 


1 Cc C—in n-1 
ya J ) 1 with c=m-—) tj- 
in=0 k=0 j=0 


This is asymptotically equivalent to 


in 


i. È in 7 E. c 7 1 
negi 2 2 12 i A A l a 


in=0 k=0 in=0 k=0 
For l = n we argue again that the summations for different ig behave the same 


n-1, nm 8 
: i ií 1 m am- j=0 ?i .m—y) j=0*F Gy 
way. Thus it follows a ire cant Dio oa Soe k=0 gma = 


z= P. Summing up, we obtain 


1o 1 1 2° — J 


=(14+-4+-4+...4+— = ——— pı. 
PN G+5+7+* + oye gni P1 
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Abstract. By a computational puzzle we mean a mildly difficult com- 
putational problem that requires resources (processor cycles, memory, 
or both) to solve. Puzzles have found a variety of uses in security. In 
this paper we are concerned with client puzzles: a type of puzzle used 
as a defense against Denial of Service (DoS) attacks. The main contri- 
bution of this paper is a formal model for the security of client puzzles. 
We clarify the interface that client puzzles should offer and give two se- 
curity notions for puzzles. Both functionality and security are inspired 
by, and tailored to, the use of puzzles as a defense against DoS attacks. 
Our definitions fill an important gap: breaking either of the two proper- 
ties immediately leads to successful DoS attacks. We illustrate this point 
with an attack against a previously proposed puzzle construction. We 
also provide a generic construction of a client puzzle which meets our 
security definitions. 


1 Introduction 


A Denial of Service (DoS) attack on a server aims to render it unable to provide 
some service by depleting its internal resources. For example, the famous TCP- 
SYN flooding attack prevents further connections to a server by starting a 
large number of TCP sessions which are then left uncompleted. The effort of the 
attacker is rather small, whereas the server quickly runs out of resources (which 
are allocated to the unfinished sessions). 

One countermeasure against connection depletion DoS attacks uses client puz- 
zles ([4]. When contacted by some unauthenticated, potentially malicious, client 
to execute some protocol and before allocating any resources for the execution, 
the server issues a client puzzle — a moderately hard computational problem. 
The server only engages in the execution of the protocol (and thus allocates 
resources) when the client returns a valid solution to the puzzle. The idea is 
that the server spends its resources only after the client has spent a significant 
amount of resources itself. To avoid the burden of running the above mechanism 
when no attackers are present, the defense only kicks in whenever the server 
resources drop below a certain threshold. 
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Client puzzles have received a lot of attention in the cryptographic commu- 
nity RPO II 424927228) but most of the prior work consists of proposing puz- 
zle constructions and arguing that those constructions do indeed work. Although 
sometimes technical, such security arguments are with respect to intuitive se- 
curity notions for puzzles since rigorous formal models for the security of such 
puzzles are missing. The absence of such models has (at least) two undesirable 
consequences. On the one hand the investigation of puzzle constructions usually 
concentrates on some security aspects and omits others which are of equal im- 
portance when puzzles are used as part of other protocols. More importantly, 
the absence of formal models prevents a rigorous, reduction-based analysis of the 
effectiveness of puzzles against DoS in the style of modern cryptography (where 
the existence of a successful DoS attacker implies the existence of an attacker 
against client puzzles). 

In this paper we aim to solve the first problem outlined above as a first key step 
towards solving the second one. The main contribution of this paper is a formal 
framework for the design and analysis of client puzzles. In addition to fixing their 
formal syntax, we design security notions inspired by, and therefore tailored for, 
the use of client puzzles as a defense against DoS attacks. Specifically, we require 
that an adversary cannot produce valid puzzles on his own (puzzle-unforgeability) 
and that puzzles are non-trivial — the client needs indeed to spend at least a 
specified amount of resources to solve them — (puzzle difficulty). The use of 
client puzzles that do not fulfill at least one of our notions immediately leads 
to a successful DoS attack. Our definitions use well-established intuition and 
techniques for defining one-wayness and authentication properties. Apart from 
some design decisions regarding the measure for resources and the precise oracles 
an adversary should have access to, there are no deep surprises here. However, 
we highlight that the lack of rigorous definitions such as those we put forward in 
this paper is dangerous. Constructions that are secure at an intuitive level, may 
be in fact insecure when used. Indeed, we explicitly demonstrate that a popular 
construction, that does not meet our notion of unforgeability, does not protect 
and in fact facilitates DOS attacks in systems that use it. 

Furthermore, we give a generic construction of a client puzzle that is secure in 
the sense we define. Many existing client puzzle constructions can be obtained 
as an instantiation of our generic construction, with only minor modifications if 
any. Our construction uses a pseudorandom function family to provide puzzle- 
unforgeability and puzzle-difficulty is obtained from a one-way function given a 
large part of the preimage. We prove our construction secure via an asymptotic 
reduction for unforgeability and a concrete reduction for difficulty. Next, we 
discuss our results in more details. 


Our Contribution 


FORMAL SYNTAX OF A CLIENT PUZZLE. Our first contribution is a formal syn- 
tax for client puzzles. We define a client puzzle as a tuple of algorithms for sys- 
tem setup, puzzle generation, solution finding, puzzle authenticity checking, and 
solution checking. The definition is designed to capture the main functionality 
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required from client puzzles from the perspective of their use against DoS 
attacks. 


SECURITY NOTIONS FOR CLIENT PUZZLES. The use of puzzles against DoS 
attacks also inspired the two (orthogonal) security notions for client puzzles 
that we design. 

To avoid storing puzzles handed out to clients (a resource consuming task), 
the server gives puzzles away and expects the client to hand back both the puzzle 
and its solution. Obviously, the server needs to be sure the client cannot produce 
puzzles on its own, as this would lead to trivial attacks. We remark that this 
aspect is often overlooked in existing constructions since it is only apparent when 
puzzles are considered in the precise context for which they are intended. We 
capture this requirement via the notion of puzzle-unforgeability. Formally, we 
define a security game where the adversary is given certain querying capabilities 
(he can for example request to see puzzles and their solutions, can verify the 
authenticity of puzzles, etc) and aims to output a new puzzle which the server 
deems as valid. 

The second notion, puzzle-difficulty, reflects the idea that the client needs to 
spend a certain amount of resources to solve a puzzle. In our definition we took 
adversary resources to mean “clock cycles”, as this design decision allows us to 
abstract away undesirable details like the distributed nature of many DoS adver- 
saries. We define a security game where the adversary is given various querying 
capabilities sufficient for mimicking a DoS attack-like environment: he can see puz- 
zles and their solutions, obtain solutions for puzzles he chooses, etc. The challenge 
for the adversary is to solve a given challenge puzzle spending less than a certain 
number of clock cycles, with probability better than a certain threshold. 


AN ATTACK ON THE JUELS AND BRAINARD PUZZLES. Most of the previous 
work on puzzles concentrates exclusively on the difficulty aspect and overlooks, 
or only partially considers, the unforgeability property. One such work is the 
puzzle construction proposed by Juels and Brainard [[4]. We demonstrate the 
usefulness of our definitions by showing the Juels and Brainard construction 
is forgeable. We then explain how a system using this kind of puzzle can be 
attacked by exploiting the weakness we have identified. 


GENERIC CONSTRUCTIONS. We provide a generic construction of a client puz- 
zle inspired by the Juels and Brainard sub-puzzle construction [A]. First, we 
evaluate a pseudorandom function (PRF), keyed by some secret value, on inputs 
including a random nonce, hardness parameter and a system specific string. This 
stage ensures uniqueness of the puzzle and the desired unforgeability; only the 
server that possesses the hidden key is able to perform this operation and hence 
generate valid puzzles. The remaining information to complete the puzzle is then 
computed by evaluating a one way function (OWF), for which finding preimages 
has a given difficulty, on the output of the PRF; the goal in solving the puzzle is 
to find such a preimage given the inputs to the PRF and the target. The idea is 
that the client would need to do an exhaustive search on the possible preimage 
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space to find such a preimage. We certify the intuition by rigorous proofs that 
the generic construction meets the security definitions that we put forth, for ap- 
propriately chosen parameters. Importantly, many secure variants of previously 
proposed constructions can be obtained as instances of our generic construction. 
For example, the puzzle constructions proposed by Juels and Brainard [A puz- 
zle and the two-party variant of the Waters et al. puzzles can be seen as 
variants of our generic construction. Finally, we provide concrete security bounds 
for the first of these puzzles. We do so in the random oracle model which we use 
to obtain secure and efficient instantiations of the two primitives used by our 
generic construction. 


Related Work 


MERKLE PUZZLES. The use of puzzles in cryptography was pioneered by Merkle 
[[8] who used puzzles to establish a secret key between parties over an inse- 
cure channel. Since then the optimality of Merkle puzzles has been analyzed by 
Impagliazzo and Rudich [2] and Barak and Mahmoody-Ghidary B]. The pos- 
sibility of basing weak public key cryptography on one-way functions, or some 
variant of them was recently explored by Biham, Goren and Ishai (J. Specifi- 
cally, a variant of Merkle’s protocol is suggested whose security is based on the 
one-wayness of the underlying primitive. 


CLIENT PUZZLES. Client puzzles were first introduced as a defense mechanism 
against DoS attacks by Juels and Brainard [14]. The construction they proposed 
uses hash function inversion as the source of puzzle-difficulty. They also attempt 
to obtain puzzle-unforgeability but partially fail in two respects. By neglecting 
the details of how puzzles are to be used against a DoS attack, the construction 
suffers from a flaw (which we explain how to exploit later in this paper) that 
can be used to mount a DoS attack. Secondly, despite intuitive claims that secu- 
rity is based on the one-wayness of the hash function used in the construction, 
security requires much stronger assumptions, namely one-wayness with partial 
information about the preimage. The authors also present a method to combine 
a key agreement or authentication protocol with a client puzzle, and present a 
set of informal desirable properties of puzzles. Building on this work, Aura et 
al. B] use the same client puzzle protocol construction but present a new client 
puzzle mechanism, also based on hash function inversion, and extend the set of 
desirable properties. 

An alternative method for constructing client puzzles and client puzzle proto- 
cols was proposed by Waters et al. [28]. This technique assumes the client puzzle 
protocol is a three party protocol and constructs a client puzzle based on the dis- 
crete logarithm problem for which authenticity and correctness can be verified us- 
ing a Diffie-Hellman based technique. One of the main advantages of this approach 
is that puzzle generation can be outsourced from the server to another external 
bastion, yet verification of solutions can be performed by the server itself. 

More recently Tritilanunt et al. [Z7] proposed a client puzzle based on the 
subset sum problem. Schaller et al. 24] have also used what they refer to as 
cryptographic puzzles for broadcast authentication in networks. 
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An interesting line of work analyzes ways to construct stronger puzzles out of 
weaker ones. The concept of chaining together client puzzles to produce a new 
and more difficult client puzzle was introduced by Groza and Petrica [I]. Their 
construction enforces a sequential solving strategy, and thus yields a harder puz- 
zle. A related work is that of Canetti, Halevi, and Steiner [5] who are concerned 
with relating the difficulty of solving one single puzzle to that of solving several 
independent puzzles. They consider the case of “weakly” verifiable puzzles (puz- 
zles for which the solution can only be checked by the party that produced the 
puzzles). That paper does not consider the use of puzzles in the context of DoS 
attacks, and thus is not concerned with authenticity. 


CLIENT PUZZLE PROTOCOLS. In an interesting paper that analyzes resistance of 
client puzzle protocols to man-in-the-middle attacks 2I], Price concludes that in 
any secure protocol the server needs to resort to digital signatures. We note that 
such concerns are related but orthogonal to the goals that we pursue in this pa- 
per. Indeed, in prior literature there is no clear distinction between client puzzles 
(the problems that the server hands out for clients to solve) and client puzzles 
protocols (the ensemble that includes, in addition to the particular puzzles that 
are constructed, the way state is maintained by the server, the mechanism for 
deeming a puzzle as expired etc.) We emphasize that in this paper we are mainly 
concerned with the former so the results of BI] do not apply. 


DoS ATTACKS. A classification of remote DoS attacks, countermeasures and a 
brief consideration of Distributed Denial of Service (DDoS) attacks were given 
by Karig and Lee [I5]. Following this Specht and Lee give a classification 
of DDoS attacks, tools and countermeasures. In the adversarial model of 
is extended to include Internet Relay Chat (IRC) based models. The au- 
thors of also classify the types of software used for such attacks and the 
most common known countermeasures. Other classifications of DDoS attacks 
and countermeasures were later given by [MIJ]. 

A number of protocols have been designed to resist DDoS attacks. The most 
important examples are the JFK protocol [I] and the HIP protocol 20). The 
JFK protocol of Aiello et al. [I] trades the forward secrecy property, known as 
adaptive forward secrecy, for denial of service resistance. The original protocol 
does not use client puzzles. In the cost based technique of Meadows 
is used to analyze the JFK protocol. Two denial of service attacks are found and 
both can be prevented by introducing a client puzzle into the JFK protocol. 


SPAM AND TIME-LOCK CRYPTO. Other proposals for the use of puzzles include 
the work of Dwork and Naor who propose to use a pricing function (a particular 
type of puzzles) to combat junk email [8]. The basic principle is the pricing func- 
tion costs a given amount of computation to compute and this computation can 
be verified cheaply without any additional information. A service provider could 
then issue a “stamp duty” on bulk mailings. Finally, Rivest et al. introduced the 
notion of timed-release crypto in and instantiate this notion with a time-lock 
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puzzle. The overall goal of timed-release crypto is to encrypt a message such that 
nobody, even the sender, can decrypt it before a given length of time has passed. 


Paper Overview. We start with a sample client puzzle from Section B] Our 
formal definition of a client puzzle and a client puzzle protocol is in Section B] 
In Section A] we give security notions for client puzzles in terms of unforgeability 
and difficulty. We demonstrate that the Juels and Brainard client puzzles is 
insecure in Section } Finally, our generic construction of a client puzzle is given 
in Section] We also include a sample instantiation based on hash functions 
which we analyze in the random oracle model. 


2 Juels and Brainard Puzzles 


To illustrate some of the basic ideas behind the construction of puzzles, we 
first give a brief description of the puzzle generation process for the Juels and 
Brainard construction [Ą. In our description we refer to the (authorized) puzzle 
generation entity (or user) as the generator and the (authorized) puzzle solving 
entity (or user) as the solver. We use the term “puzzle” from here onwards 
for individual puzzle instances. We write {0,1} for the set of binary strings of 
length t and {0,1}* for the set of binary strings of arbitrary finite length. If 
£ = @,01,...,Vi,...,Lj,..-, Ln is a bit string then we let x(i, j} denote the 

sub string 2,...,2;. 

For this construction the generator (generally some server) holds a long term 
secret value s chosen uniformly from a space large enough to prevent exhaustive 
key search attacks. The server also chooses a hardness parameter: a pair Q = 
(a, 3) € N? which ensure puzzles have a certain amount of difficulty to solve. 
We let H : {0,1}* — {0,1}™ be some hash function. To generate a new puzzle 
the generator performs the following steps to compute the required sub-puzzle 
instances P; for j € {1,2,...,G}: 

e A bit string gj is computed as gj = H(s,str,7). The value str has the struc- 
ture str = t||M for t some server time valud] and M some unique valud. We 
denote x; = 0,;(1, a) and zj =0;(a+1, m). 

e <A value yj is computed as y; = H(a;) and the sub-puzzle instance is Pj = 
(25,49) 

The full puzzle instance is then the required parameters plus the tuple of sub- 
puzzle instances puz = (Q,str, P = (Pi, Po,..., P3)). The sub-puzzle instance 
generation process is summarized in Figure [M 

A solution to a given sub-puzzle P; is any string x; such that H (x) || z;) = yz. 
The solution to the full puzzle instance is a tuple of solutions to the sub-puzzles. 
To verify a potential solution soln = (Q, str, solny,--- , solng) the generator verifies 


' The details of the type of value this is are not described in [A but here we will 
assume this is as a bit string. 

2 In [14] this is specified as the first message flow of a protocol or some other unique 
data. Again, we will assume this is encoded as a bit string since this is not specified. 
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Generation Parameters: s, str, j 


| 
H(s,str,j) 


soln; = £j (03) Pi = (25,49) 


Fig. 1. The Juels and Brainard Sub-Puzzle Instance Generation 


each P; and soln; by checking that H(soln, || z;) = y; for each j. The authentic- 
ity of a given puzzle is checked by regenerating each P; using s and comparing 
this to the puzzle submitted. 

To incorporate this client puzzle into a client puzzle protocol the server, on 
receiving a valid solution, allocates buffer slots, by using a hash table on the 
values of M, for each puzzle and correct solution submitted. This ensures that 
only one puzzle instance and solution are accepted for a given value of M. 


3 Client Puzzles 


The role of a client puzzle in a protocol is to give one party some assurance that 
the other party has spent at least a given amount of effort computing a solution 
to a given puzzle instance. In this section we give a formal definition of a client 
puzzle in the most general sense. 


FORMAL SYNTAX OF A CLIENT PUZZLE. A client puzzle is a tuple of algorithms: 
a setup algorithm for generating long term public and private parameters, an 
algorithm for generating puzzle instances of a given difficulty, a solution finding 
algorithm, an algorithm for verifying authenticity of a puzzle instance and an 
algorithm for verifying correctness of puzzle and solution pairs. We formally 
define a client puzzle as follows. 


Definition 1 (Client Puzzle). A client puzzle CPuz = (Setup, GenPuz, 

FindSoln, VerAuth, VerSoln) is given by the following algorithms: 

e Setup is a p.p.t. setup algorithm. On input of 1", for security parameter k, 
it performs the following operations: 

e Selects the long term secret key space sSpace, hardness space QSpace, 
string space strSpace, puzzle instance space puzSpace and solution space 
solnSpace. 

e Selects the long term puzzle generation key s re sSpace. 

e Sets II additional public information, such as some description of algo- 
rithms required for the client puzzle. 

e Sets params+—(sSpace, puzSpace, solnSpace, QSpace,II) and outputs 
(params, s). 
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The tuple params is the public system parameters and as such is not explicitly 
given as an input to other algorithms. The value s is kept private by the puzzle 
generator. 


e GenPuz is a p.p.t. puzzle generation algorithm. On input of s € sSpace, 
Q E QSpace and str € strSpace it outputs a puzzle instance puz € puzSpace. 


e FindSoln is a probabilistic solution finding algorithm. On input of puz € 
puzSpace and a run time T € N it outputs a potential solution soln € 
solnSpace after at most T clock cycles of execution. 


e VerAuth is a d.p.t. puzzle authenticity verification algorithm. On input of 
s € sSpace and puz € puzSpace this outputs true or false. 


e VerSoln is a deterministic solution verification algorithm. On input of puz € 
puzSpace and a potential solution soln € solnSpace this outputs true or false. 

For correctness we require that if (params, s)-Setup(1") and puz-GenPuz(s, Q, 

str), for Q € QSpace and str € strSpace, then 

e VerAuth(s, puz) = true and 


e 47 €N such that soln—FindSoln(puz,7) and VerSoln(puz, soln) = true. 


REMARK 1. Typically client puzzles use a set of system parameters, most notably 
system time, as input to the puzzle generation algorithm. This is so the server 
has a mechanism for expiring puzzles handed out to clients. In our model we use 
str to capture this input and do not enforce any particular structure on it. 


REMARK 2. To prevent DoS attacks that exhaust the server memory it is desir- 
able that the server stores as little state as possible for uncompleted protocol runs 
(i.e. before a puzzle has been solved). We refer to this concern of client puzzles 
as “state storage costs” B]. We build this into our definition of a client puzzle by 
insisting that only a single value, s, is stored by a server; all the data necessary 
to solve a given puzzle and to re-generate, and hence verify authenticity of a 
puzzle and solution pair, is included in the puzzle description puz. 


REMARK 3. Generally, for a puzzle to be “secure” when used within a client 
puzzle protocol, we want puzzles generated to be unique and for puzzle and 
solution pairs to only be validly used once by a client. In actual usage, a server can 
filter out resubmitted correctly solved puzzle and solution pairs by, for example, 
using a hash table mechanism. Uniqueness of puzzles can be ensured by having 
GenPuz select a random nonce ng and use this in the puzzle generation. 


REMARK 4. Our definition assumes private verifiability for VerAuth. Generally 
the only party concerned with checking who generated a given puzzle is the 
puzzle generator (client puzzles are used before any other transactions take place 
and to protect the generator and no other party). Although in some cases it may 
be useful to have publicly verifiable puzzles it would complicate the definition 
and we choose to keep our definition practical yet as simple as possible. 
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4 Security Notions for Client Puzzles 


We define two notions for client puzzles. The first measures the ability of an 
adversary to produce a correctly authenticating puzzle with an unknown private 
key. We refer to this as the ability of an adversary to forge a client puzzle. The 
second notion gives a measure of the likelihood of an adversary finding a solution 
to a given puzzle within a given number of clock cycles of execution. We refer 
to this as the difficulty of a client puzzle. Intuitively, these are both what one 
would expect to require from a client puzzle given its role in defenses against 
DoS attacks; being able to either forge puzzles or solve them faster than expected 
allows an adversary to mount a DoS attack. 

We first review the definition of a function family since we will use function 
families to express security of a given client puzzle in terms of difficulty. A 
function family is a map F : I x Dt R. The set I is the set of all possible 
indices, D the domain and R the range. Unless otherwise specified we assume 
I = N. The set R is finite and all sets are nonempty. We write F; : Dt R for 
Fi(d) = F(i,d) where i € I and refer to F; as an instance of F. 


UNFORGEABILITY OF PUZZLES. We first define our notion of unforgeability of 
client puzzles. Intuitively, we require an adversary that sees puzzles generated 
by the server (possibly together with their associated solutions), and that can 
verify the authenticity of any puzzle it chooses, cannot produce a valid looking 
puzzle on his own. 

To formalize unforgeability of a client puzzle we use the following game 
Exec] cpu, (k) between a challenger C and an adversary A. 


(1) The challenger C first runs Setup on input 1* to obtain (params, s). The 
tuple params is given to A and s is kept secret by C. 

(2) The adversary A gets to make as many CreatePuz(Q, str) and CheckPuz(puz) 
queries as it likes which C answers as follows. 


e CreatePuz(Q, str) queries. A new puzzle is generated puz—GenPuz(s, Q, 
str) and output to A. 


e CheckPuz(puz) queries. If VerAuth(s, puz) = true and puz was not out- 
put by C in response to a CreatePuz query then C terminates the game 
setting the output to 1. Otherwise false is returned to A. 


(3) IfC does not terminate the game in response to a Check query then even- 
tually A terminates and the output of the game is set to 0. 

We say the adversary A wins if Exec cpu, (k) = 1 and loses otherwise. We define 

the advantage of such an adversary as 


Advi cpuz(k) = Pr [Exec 4 cpuz(%) =1). 


Puzzle-unforgeability then means that no efficient adversary can win the above 
game with non-negligible probability. 
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Definition 2 (Puzzle-unforgeability). A client puzzle CPuz is UF secure if 
for any p.p.t. adversary A its advantage Adv cpuz(k) is a negligible function 
of k. 


REMARK 1. In the game Exec'y cpyz(k) we allow A access to all algorithms defined 
in CPuz. In particular, we allow unlimited access to the GenPuz algorithm for any 
given chosen inputs. This allows A to generate as many puzzles as it wishes (since 
A is p.p.t. it will anyway generate at most polynomially many) with any given 
chosen key and difficulty values. Notice that the adversary can find solutions to 
any puzzle by running the FindSoln algorithm which is public. These abilities 
are sufficient to mimic the environment in which a DoS attacker would sit. 


DIFFICULTY OF SOLVING PUZZLES. We formalize the idea that a puzzle CPuz can- 
not be solved trivially via the game Exec PIRE (k) between a challenger C and an 


adversary A. The game is defined for each hardness parameter Q € N as follows: 


(1) The challenger C runs Setup on input 1* to obtain (params, s) and passes 
params to A. 

(2) The adversary A is allowed to make any number of CreatePuzSoln(str) 
queries throughout the game. In response to each such query C generates 
a new puzzle as puz—GenPuz(s, Q, str) and finds a solution soln such that 
VerSoln(puz, soln) = true. The pair (puz, soln) is then output to A. 

(3) At any point during the execution A is allowed to make a single Test(strt) 
query. The challenger then generates a challenge puzzle as puz!'—GenPuz(s, 
Q, strt) which it returns to A. 


Adversary A terminates its execution by outputting a potential solution soln’. 
We define the running time 7 of A as being the running time of all of the 


experiment ExecSCp (k). 


We say the adversary wins Exec PIRE (k) if VerSoln(puzt,soln') = true. In 


this case we set the output of Exec epuz(K) to be 1 and otherwise to 0. We then 


define the success of an adversary A against CPuz as 
„DIFF „DIFF 
Succ Cpiz(k) =f [Exec pun () = 1 : 


We define the difficulty of puzzle solving by requiring that for any puzzle hardness 
the success of any adversary that runs in a bounded number of steps falls below 
a certain threshold (that is related to the hardness of the puzzle). 

Definition 3 (Puzzle-difficulty). Let e: N? > (N — [0, 1)) be a family of 
monotonically increasing functions. We use the notation €x,Q(-) for the function 
within this family corresponding to security parameter k and hardness parameter 
Q. We say a client puzzle CPuz is e(-)-DIFF if for allr € N, for all adversaries 
A in Exec) cpe (k), for all security parameters k € N, and for all Q € N it holds 
that 


„DIFF 
Succ” ipuz(k) < Ek Q(T) 


where A, is the adversary A restricted to at most T clock cycles of execution. 
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REMARK 1. The security game above allows A to obtain many puzzle and so- 
lution pairs by making queries to model actual usage in DoS settings; when 
a client puzzle is used as part of a client puzzle protocol an adversary may see 
many such puzzles and solutions exchanged between a given generator and solver 
on a network. The adversary could then learn something from these. 


REMARK 2. In the definition of the Test query, we do allow the string str’ to 
be one previously submitted as a CreatePuzSoln query and allow CreatePuzSoln 
queries on any string including str’ after the Test query. It then immediately 
follows that a difficult puzzle needs to be such that each puzzle generated is 
unique. Otherwise, a previously obtained solution through the CreatePuzSoln 
query may serve as a solution to the challenge query. Furthermore, it also follows 
that solutions to some puzzles should not be related to the solutions of other 
puzzles, as otherwise a generalization of the above attack would work. 


REMARK 3. The queries CreatePuz (used in the game for puzzle-unforgeability) 
and CreatePuzSoln used in the above game are related, but different. The query 
CreatePuzSoln outputs a puzzle together with its solution. The second is more 
subtle: in a CreatePuz query we allow A to specify the value of Q used but in 
CreatePuzSoln we do not (the value of Q is fixed throughout the difficulty game). 


REMARK 4. Clearly any puzzle that is ¢(-)—DIFF is also (¢(-) + y)-DIFF where 
u € Ryo is such that e(7)+p < 1 (since Suceg” cruz (K) < Ek Q(T) < €x,Q(7)+p). 
The most accurate measure of difficulty for a given puzzle CPuz is then the 
function e(7) = inf 4, Bier Coys): 

REMARK 5. Since we measure the running time of the adversary in clock cycles, 
the model abstracts away the possibility that the adversary may be distributed 
and thus facilitates further analysis (for example of the effectiveness of client 
puzzle defense against DoS attacks). 


5 An Attack on the Juels and Brainard Puzzles 


In this section we describe an attack on the Juels and Brainard [TJ] client puz- 
zle mechanism as described in Section] The attack works because puzzles are 
forgeable, which is due to a crucial weakness in puzzle generation; each set of 
generation parameters defines a family of puzzles each with a different hardness 
value. Finally we construct a DDoS attack on servers using certain client puzzle 
protocols based on this construction. This attack clearly demonstrates the ap- 
plicability of our definitions and how they can be used to find problems with a 
given client puzzle construction. 


PROVING FORGEABILITY. The reason the construction is forgeable is the authen- 
tication is not unique to a given instance but covers a number of instances of vary- 
ing difficulty. This occurs because the puzzle instance difficulty is not included in 
the first preimage of the sub-puzzle construction. We exploit this weakness and 
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construct an adversary A with Adv“ cpuz(k) = 1. We have the following lemma 
regarding the forgeability of the Juels and Brainard construction. 


Lemma 1. The client puzzle construction of Juels and Brainard E is not UF 
secure. 


Proof. To prove this we construct an adversary A against the UF security of the 
construction that can win the security game Exec cpuz(K) with probability 1. 
We now describe the details of A. 

At the start of the security game A is given a set of public parameters. The 
adversary then makes a query CreatePuz(Q, str) for some random choices of Q 
and str where Q = (a,() and receives a puzzle instance puz = (Q,str,P = 
(Pi, P2,...,P3)) in response. Next A removes the first bit of each P; to obtain 
Pi and constructs Qt = (a + 1, 8) and puz? = (Qt, str, Pt = (PI, PI, ie , P})). 
The adversary then makes a query CheckPuz(puz'). 

Clearly A wins with probability 1 since puz and puz' are both generated from 
the same s and str hence puz! will correctly verify yet was not output from a 
CreatePuz query. 


REMARK 1. One could also prove Lemma [I by having the adversary construct 
the forgery as Qt = (a, 6 — 1) and then puz? = (Qt, str, (PI,.. gad) One 
could also vary the number of bits moved between a and each P; or change the 
number of sub-puzzles deleted. The reason we choose to give the proof in the 
manner given is because this specific method allows for the construction of a 
DDoS attack with the given assumptions we make about the protocol using this 
particular client puzzle. We describe this attack next. 


CONSTRUCTING A DDoS ATTACK. We now use the forgeability of the con- 
struction to mount a DDoS attack on client puzzle protocols based on this client 
puzzles. The attack works when the difficulty parameter is increased in a certain 
way and when the hash table, mentioned in [A] and used to prevent multiple 
puzzle instance and solution submissions, is based on some unique data for each 
instance that is not in the preimage of any sub-puzzle. A hash table mechanism 
that depends on some unique data contained in each sub-puzzle preimage, as 
is mentioned in [14], would thwart the following DDoS attack on client puzzle 
protocols based on this client puzzle. 

We first assume the client puzzle is used in the client puzzle protocol of [A 
and the generator increases Q by increasing a many times for each increase in 
B. We also assume any hash tables used are computed using either the puzzle 
instance alone or the correct solutions alone. 

To mount the DDoS attack the adversary commands each of its zombies (plat- 
forms the adversary controls) to start a run of the protocol with the server under 
attack. The server will begin to issue puzzle instances and then, when enough 
requests are received, will increase Q by incrementing a. Each zombie computes 
a solution to the first puzzle it receives and to submits this to the server. Then, 
while this puzzle has not expired, each time a is incremented, a new puzzle and 
solution pair is trivially computed by removing the first bit from each x; and 
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concatenating this to the end of each soln; previously computed. The new puz- 
zle and solution pair are submitted to the server and will correctly verify and 
will then be allocated buffer space (due to our assumptions on the hash table 
mechanism). When a zombies’ puzzle expires it obtains a new one. As the value 
Q is increased then so will the puzzle expiry period and hence more forged puz- 
zles can be used per valid puzzle obtained eventually exhausting the memory 
resources of the server. 

Also, even if we assume that the buffer allocation based on the hash table 
mechanism is as in [[4] the attack will still consume a huge amount of server 
computational resources. This is because the adversary can trivially spoof new 
puzzle instances and solutions from previous ones. These will not be allocated 
buffer space due to the hash table mechanism, but will consume computation 
via server verification computations. In the next section we give a an example 
instantiation of a generic construction that is a repaired version of the sub-puzzle 
mechanism; an unforgeable version of the sub-puzzle construction. 


6 A Generic Client Puzzle Construction 


In this section we provide a generic construction for a client puzzle which also 
repairs the flaw identified in the previous section with respect to the Juels and 
Brainard puzzle. Our construction is based on a pseudorandom function (PRF) 
and a one way function (OWF). We prove our generic construction is secure 
according to the definitions we put forth in this paper, and show one possible 
instantiation. Intuitively, the unforgeability of puzzles is ensured by the use 
of the PRF and the difficulty of solving puzzles is ensured by the hardness of 
inverting the one-way function. We first review some notational conventions and 
definitions regarding function families, pseudorandom functions, and concrete 
notions for pseudorandom function families and one way function families. 

If F is a function family then we use the notation f ¿Ffori I; JeF; 
We denote the set of all possible functions mapping elements of D to R by 
Func(D, R). A random function from D to R is then a function selected uniformly 
at random from Func(D, R). 


PSEUDORANDOMNESS. We define the PRF game Execg p(k) for an adversary 6 

against the function family F : K x D+ R, where |K| = 2", as follows. 

(1) For b= 1 the adversary B has black box access to a truly random function 
R from the set Func(D, R) and for b = 0 the adversary 6 has black box 
access to a function F, chosen at random from F. 


(2) The adversary B is allowed to ask as many queries as it wants to whichever 
function it has black box access to. Eventually 6 terminates outputting a 
bit b”. 

We set the output of Exec p” (k) to 1 if b* = b and set the output to 0 otherwise. 

We then define the advantage of an adversary against F in terms of PRF as 


AdvĒRE(k) = [Pr [Execk E (k) = 1] — Pr[ExecR®F4(k) = 1] k 
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CONCRETE PSEUDORANDOM AND ONE WaAy FUNCTION FAMILIES. Here we 
briefly review concrete notions of security for pseudorandom function and one 
way function families. We depart from the typical “(¢,t)- hardness” style of 
definitions, as they are not sufficient for our purposes. Instead we view €, the 
probability of a break, as a function of the running time 7 of the adversary. So, 
a primitive is e(-)- secure if for all adversaries running in time 7 the probability 
of breaking the primitive is at most e(r). 


Definition 4 (v;(-)—-PRFF). Let F:K x D |œ R be a function family and 
v:N => (N = [0,1]) be a family of monotonically increasing functions. We 
say F is a vgk(:)-PRFF if for all k E€ K and for all adversaries A it holds that 
Advi" p(k) < vk(T). 

Note that, in the definition of an 1;,(-)-PRFF, the security parameter k specifies 
the size of the keyspace for the game Exech t p(k) and the actual key, and hence 
function from the family used, is chosen at random from this keyspace. 


Definition 5 (¢;(-)-OWF). For an adversary A we define its advantage against 
a function y% : X œ YV, where X is fixed and finite, in terms of OWF as 


AdvQy = Prix S ¥; yv(x); (#—A(y) AV) = y)]. 


Let ci : N+ [0,1] be a monotonically increasing function. Then, the function 
is an €;(-)-OWF if for all adversaries A it holds that Adv“, <e,(rT). 


We then extend this definition to a family of functions as follows: 


Definition 6 (¢(-)-OWFF). Let p: Nr (X => Y) and e:N > (N+ [0,1]) 
be function families. We say p is an e(-)-OWFF if for alli € N the function 
pi: X = Y is an e;(-)-OWF. 


THE GENERIC CONSTRUCTION. Our generic construction is based on the method 
of Juels and Brainard [14]. Most client puzzle constructions based on one way 
functions, such as the discrete log based scheme of 28], and the RSA based 
scheme of [IQ], can be described in this manner with some minor modifications. 
So, our generic construction pins down sufficient assumptions on the build- 
ing blocks that imply security of the resulting puzzle. We let k € N then let 
F:K x D= X where |¥| > |K| = 2* be a function family indexed by elements 
of K. The domain D of F, is 3-tuples of the form N x {0,1}* x {0,1}* € {0,1}*. 
We write F((-,-,+)) when we want to specify the exact encoding of an element 
of D explicitly as an input to F. We further let py: Nt (æ = y) be a family 
of functions indexed by Q. We assume there is a polynomial time algorithm to 
compute Yq for each value of Q and input. The various algorithms in the scheme 
are then as follows: 
Setup(1*). The various spaces are chosen; sSpace—K, QSpace--N, strSpace— 
{0,1}*, solnSpace-—¥ and puzSpace ~QSpace x strSpace x {0,1}* x VY. The 
parameter IT is assigned to be the polynomial time algorithm to compute ya 


for all Q € QSpace and x € Æ. Finally, the value s is chosen as s © sSpace 
and the tuple params constructed then output. 
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GenPuz, VerSoln, VerAuth 
GenPuz, VerAuth 
os y = pQ (z) a a 


x = F; (Q,str,ns) ————> soln puz 
T , -1 ee 
`Z r= PQ (vy) 7 


FindSoln 


Fig. 2. Solid arrows are actions performed by a generator and dashed ones by a solver. 
The lists of algorithms above/below arrows imply the actions are performed as part 
of these algorithms. The details of how each action is used in the given algorithm are 
given in the full description. 


Q, str, ns 


GenPuz(s, Q,str). A nonce is selected ng Pl {0,1}*. Next x is computed as 
xr—F,(Q,str, ns). The value y € Y is computed as y~-yg(z) and the puzzle 
assigned to be puz = (Q, str, ns, y) and output. 


FindSoln(puz, 7). While this algorithm is within the allowed number of clock 
cycles of execution it randomly samples elements from the set of possible 
solutions without replacement and for each potential preimage 2’ € X com- 
putes y’-yQ(z'). If y’ = y this outputs v’ then halts and otherwise continues 
with random sampling. If this algorithm reaches the last clock cycle of execu- 
tion then it outputs a random element of the remaining unsampled preimage 
space. The set of possible solutions is generally a subset of ¥ that is defined 
by the value y of size dependent upon Q in some manner; the details of how 
the size varies depends upon the function family y. 


VerAuth(s, puz’). For a puzzle puz’ = (Q’,str’,n/g,y’) this computes a’ as a’— 
F,(Q’, str’, n's) then y—ye(a"). If y’ = y this outputs true and otherwise 
outputs false. 


VerSoln(puz’, soln’). Given a potential solution soln’ = x’ this checks if yo (x) = 
y and if so outputs true and otherwise outputs false. 


We use the notation CPuz = PROWF(F; p) for the generic construction in 
this manner. The construction is summarized in Figure J] 


REMARK 1. In the definition of an ¢(-)-OWF we specify the domain & is fixed 
and finite but do not specify the exact size or shape of this; in our generic 
construction this is set to be the output space of some PRF. 


REMARK 2. The exact specification of the FindSoln algorithm is not important 
for our theorems and proofs, nor is it unique. Indeed, other techniques such as 
exhaustive search may even be faster than the algorithm given. The important 
point is such an algorithm exists and can be described. 


REMARK 3. The domain D of F is given as 3 tuples of the form N x {0,1}* x 
{0,1}* which is the same as {0,1}*. However, we will always construct elements 
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of D from a given tuple rather than taking a bit string and encoding it as an 
element of D. Hence we do not refer to this as a uniquely recoverable encoding 
on D. 


REMARK 4. In reality the variable ng need not be sampled at random; it just has 
to be a nonce and could be instantiated with, for example, a counter. We specify 
uniform sampling from the domain of ng since it makes our proofs simpler and 
easier to follow. 


REMARK 5. Our generic construction is similar to the Juels and Brainard scheme 
[L] but avoids the forgeability problems by including the hardness parameter Q 
in the input to F. 


REMARK 6. Finally, we remark that the generic construction where the PRF 
function is replaced by a MAC is not necessarily secure. Indeed, one-wayness of 
the generic construction is guaranteed as long as the one-way function is applied 
to randomly chosen bit-string. While this property is ensured through the use of 
a pseudo-random function, it is does not always hold for a MAC. For example, 
the combination of an artificial MAC function for which the first half of the 
output bits are constant with the OWF that discards the first half of its input 
is clearly an insecure puzzle construction. 

The following theorems capture the unforgeability and the level of hardness 
enjoyed by our generic construction. Their proofs can be found in the full version 
of the paper. 


Theorem 1. Let F be a PRF family and p a family of functions as described 
above such that for each value of Q and for all y E€ Y we have leg (W)I/4 < 
1/2", where k is the security parameter. Then the client puzzle defined by CPuz = 
PROWF(F; p) is UF secure. 


To understand the rôle of the condition that leg (Y)I/4| < 1/2* consider the 
(extreme) case when F has a small constant number of images, that each cor- 
responds to roughly the same number of possible inputs to F. Notice that this 
condition does not contradict the pseudorandomness of F, but such a function 
is not sufficient to ensure unforgeability. Indeed, an attacker can select a ran- 
dom y € Y, obtain some « such that y(x) = y and select some random triple 
(Q, str, ns) as the solution to the puzzle. With probability about half, the image 
of (Q,str, ng) is x. The adversary can therefore produce solved puzzles that are 
valid without interacting with the server. 


Theorem 2. Let F be a v(-)-PRFF family for the function family v(-), p an 
e(-)-OWFF for the function family e(-) and CPuz = PROWF(F; p). Then the 
client puzzle PROWF(F; p) is y(-)-DIFF where 


Q(T) = 2-ve(7 + 7°) + (1+7/(2* — 7)) -egr +7") 
and 7°, 7) EN are some constants. 


An adversary may try to solve puzzles by either computing the value F’,(Q, str, 
ng) for an unknown value of s or by computing a preimage of yg for the value 
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y provided. The function vk in Theorem B] captures that computing F, for an 
unknown value of s should not be easy; the function F needs to be a good PRF. 
Intuitively, k should be chosen to be large enough that it is easier to compute a 
preimage of y under yg than computing the corresponding value F(Q, str, ns). 


Impact on Practical Implementations of Puzzles 


Some of the most popular proposals of puzzles are based on hash functions. 
In this section we instantiate our generic construction from the previous section 
using hash functions to construct the needed PRF and one-way function families. 
We obtain essentially a modified Juels and Brainard scheme that incorporates 
the defence against the attack that we present in Section] The security analysis 
is in the random oracle model. 

Given a hash function H : {0,1}* — {0,1} a standard construction for a PRF 


family F is as follows. Key generation selects a random string s = {0,1}* where 
k is the security parameter. Function application is defined by F(x) = H(s||zx) 
for any x € {0,1}* Furthermore, given a hash function G : {0,1}* — {0,1}” we 
define the function family ¢ of functions yg : {0,1}™ — {0,1}™~@ x {0,1}” by 
polz) = (x(Q +1, m), G(x)). In the full version of the paper we prove that in 
the random oracle model, F is a v;,(-)-PRFF function, for some function family 
v with v(T) < ge and that ọ is e(-)-OWFF for some function family e with 
e(T) < 7/2™ 4+ 7/(2™-®). Concrete bounds for the security of our construction 
follow by instantiating the bounds in Theorems [and J] 
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Abstract. Non-malleability is an interesting and useful property which 
ensures that a cryptographic protocol preserves the independence of the 
underlying values: given for example an encryption E(m) of some un- 
known message m, it should be hard to transform this ciphertext into 
some encryption E(m*) of a related message m*. This notion has been 
studied extensively for primitives like encryption, commitments and zero- 
knowledge. Non-malleability of one-way functions and hash functions has 
surfaced as a crucial property in several recent results, but it has not un- 
dergone a comprehensive treatment so far. In this paper we initiate the 
study of such non-malleable functions. We start with the design of an 
appropriate security definition. We then show that non-malleability for 
hash and one-way functions can be achieved, via a theoretical construc- 
tion that uses perfectly one-way hash functions and simulation-sound 
non-interactive zero-knowledge proofs of knowledge (NIZKPoK). We also 
discuss the complexity of non-malleable hash and one-way functions. 
Specifically, we show that such functions imply perfect one-wayness and 
we give a black-box based separation of non-malleable functions from 
one-way permutations (which our construction bypasses due to the “non- 
black-box” NIZKPoK based on trapdoor permutations). We exemplify 
the usefulness of our definition in cryptographic applications by show- 
ing that (some variant of) non-malleability is necessary and sufficient 
to securely replace one of the two random oracles in the IND-CCA en- 
cryption scheme by Bellare and Rogaway, and to improve the security of 
client-server puzzles. 


1 Introduction 


MOTIVATION. Informally, non-malleability of some function f is a cryptographic 
property that asks that learning f(a) for some x does not facilitate the task of 
generating some f(x*) so that a* is related to x in some non-trivial way. This 
notion is especially useful when f is used to build higher-level multi-user pro- 
tocols where non-malleability of the protocol itself is crucial (e.g., for voting or 
auctioning). Non-malleability has been rather extensively studied for some cryp- 
tographic primitives. For example, both definitions as well as constructions from 


M. Matsui (Ed.): ASIACRYPT 2009, LNCS 5912, pp. 524 2009. 
© International Association for Cryptologic Research 200! 


Foundations of Non-malleable Hash and One-Way Functions 525 


standard cryptographic assumptions are known for encryption, commitments 
and zero-knowledge PREIA]. Non-malleability in the case 
of other primitives, notably for one-way functions and for hash functions [] has 
only recently surfaced as a crucial property in several works [BOMI], which 
we discuss below. 

For instance, plenty of cryptographic schemes are only proved secure in the 
random oracle (RO) model l, where one assumes that a hash function behaves 
as a truly random function to which every party has access to. It is well-known 
that such proofs do not strictly guarantee security for instantiations with hash 
functions whose only design principles are based on one-wayness and/or collision- 
resistance, because random functions posses multiple properties the proofs may 
rely on. Hiding all partial information about pre-images, i.e. perfect one-wayness, 
is one of these properties, and has been studied in DI]. Non-malleability is 
another example of such a property. 

An illustrative example is the encryption scheme of Bellare and Rogaway W, 
where a ciphertext of message M has the form (f(r),G(r) © M, H(r, M)) for 
a trapdoor permutation f, hash functions G,H and random r. The scheme is 
known to be IND-CCA secure in the random oracle model. However, an instan- 
tiation of H with a malleable function for which given H(r, M) it is possible to 
compute H(r,M ® M’), for some fixed M’ known to the attacker, renders the 
scheme insecure: the attacker can recover M by submitting to the decryption 
oracle the valid ciphertext (f(r), G(r) @ M  M', H(r, M @ M’)). 

It was shown in [f] that a similar attack can be carried out against the popular 
OAEP encryption scheme whenever the instantiation of the underlying hash 
function is malleable. A subsequent work [8] showed that some form of non- 
malleability permits positive results about security of an alleviated version of 
the OAEP scheme in the standard model. However, it remains unclear if the 
approach to non-malleability in [8] expands beyond the OAEP example, and the 
work left open the construction of non-malleable primitives. 

Another motivating example is the abstraction used to model hash functions 
in symbolic (Dolev-Yao) security analysis. In this setting it is aviomatized that 
an adversary can compute some hash only when it knows the underlying value. 
Clearly, malleable hash functions do not satisfy this axiom. Therefore, non- 
malleability for hash functions is necessary in order to ensure that symbolic 
analysis is (in general) sound with respect to the standard cryptographic model. 
Otherwise, real attacks that use malleability can not be captured/discovered in 
the more abstract symbolic model. 

In a different vein, and from a more conceptual perspective, higher-level pro- 
tocols could potentially benefit from non-malleable hash functions as a building 
block. A recent concrete example is the recommended use of such non-malleable 
hash functions in a human-computer interaction protocol for protecting local stor- 
age [L]. There, access should be linked to the ability to answer human-solvable 


1 In the sequel we aggregate both one-way functions and hash functions under the 
term hash functions for simplicity. 
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puzzles (similar to CAPTCHAs), but it should be infeasible for a machine to maul 
puzzles and redirect them under a different domain to other human beings. 

We will also discuss a construction of a cryptographic puzzle from de- 
signed to prevent DoS attacks, and show that malleability of the underlying 
hash function leads to insecure constructions. 

Hence, non-malleability is a useful design principle that designers of new hash 
functions should keep in mind. At this point, however, it is not even clear what 
the exact requirements from a theoretical viewpoint are. Therefore, a first neces- 
sary step is to find a suitable definition which is (a) achievable, and (b) applica- 
ble. The next step would be to design practical hash functions and compression 
functions which are non-malleable, or which at least satisfy some weaker variant 
of non-malleability. 


CONTRIBUTIONS. In this paper we initiate the study of non-malleable hash 
functions. We start with the design of an appropriate security definition. Our 
definition uses the standard simulation paradigm, also employed in defining non- 
malleability for encryption and commitment schemes. It turns out however that 
a careless adjustment of definitions for other primitives yield definitions for non- 
malleable hash functions that cannot be realized. We therefore motivate and 
provide a meaningful variation of the definition which ensure that the notion is 
achievable and may be useful in applications. 

Testifying to the difference to other cryptographic primitives, we note that 
for non-malleable encryption the original simulation-based definition of [I7] was 
later shown to be equivalent to an indistinguishability-based definition [5]. For 
our case here, finding an equivalent indistinguishability-based definition for non- 
malleable hash functions appears to be far from trivial, and we leave the question 
as an interesting open problem. 

We then show that our definition can be met. Our construction of a non- 
malleable hash function employs a perfectly one-way hash function (POWHF) 
[ONT], i.e., a probabilistic hash function which hides all information about its pre- 
image. Notice that this form of secrecy in itself does not ensure non-malleability, 
so we make the function non-malleable by appending a simulation-sound non- 
interactive zero-knowledge proof of knowledge (NIZKPok) of the hashed 
value Both primitives exist, for example, if trapdoor permutations exist 

The construction we provide is probabilistic and does not achieve the desired 
level of efficiency for practical applications. We emphasize that our construction 
should be regarded as a feasibility result that shows that, in principle, non- 
malleable hash functions can be built from standard assumptions. We leave open 


? Analogously to Canetti’s terminology of perfectly one-way hash functions [9] we refer 
to our construction as a hash function since we require collision resistance, although 
it does not compress. 

3 We remark that the intuitively appealing approach of using non-malleable encryption 
or commitment schemes to directly construct non-malleable hashes does not work. 
One of the reasons is that the former primitives rely on secret randomness, whereas 
hash values need to be publicly verifiable given the pre-image. 
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the problem of finding a practical, deterministic solution. We note that our 
definition is general enough to allow such constructions. 

Next, we investigate necessary cryptographic assumptions for building non- 
malleable functions. We provide two results. First we show that a non-malleable 
hash function needs to hide any information about the pre-image. This result 
justifies the use of POWHFs in our construction. Then we show (in the style of 
Impagliazzo-Rudich [24]) that black-box constructions of non-malleable one-way 
functions from one-way permutations are in fact impossible even if the collision- 
resistance requirement is dropped. To be more precise, we follow the approach of 
Hsiao and Reyzin and show that no black-box security reduction is possible. 
Notice that our construction circumvents the impossibility result due to the use 
of a “non-black-box” NIZKPok. 

Finally, we study the applicability of our definition. We show that 
non-malleability is in fact sufficient for secure partial instantiation of the afore- 
mentioned encryption scheme of Bellare and Rogaway [4], i.e., that the scheme 
remains IND-CCA secure when H is replaced with a non-malleable hash func- 
tion. Although G is still a random oracle, this partial instantiation helps to 
better understand the necessary properties of the primitives and also provides a 
better security heuristic. 

We also sketch an application to the framework of cryptographic puzzles 
as a defense against DoS attacks, where non-malleability surfaces as an important 
property. The usefulness of the definition has also been shown in [19], using a 
special case of a preliminary version of our definition to prove that HMAC [3] is 
a secure message authentication code, assuming that the compression function 
of the hash function is non-malleable. We expect further applications of non- 
malleable hash functions in other areas, and some of the techniques used in our 
proof here may be helpful for these scenarios. 


RELATED WORK. Independently of our work, Canetti and Dakdouk and 
Pandey et al. recently also suggested one-way functions with special prop- 
erties related to, yet different from non-malleability, and Canetti and Varia [13] 
investigated non-malleable obfuscation. The work of Canetti and Dakdouk 
introduces the notion of extractable perfect one-way functions where generating 
an image also guarantees that one knows a preimage. This should even hold if 
an adversary sees related images, a setting which somewhat resembles the one 
that we give for non-malleability. Yet, extractability in is defined by requir- 
ing the existence of a knowledge extractor which generates a preimage from the 
adversary’s view, including the other images. In contrast, the common approach 
to non-malleability (which we also adopt) is to deny the simulator access to the 
other images, in order to capture the idea that these images should not help. 
Hence the security definition from is incomparable to ours. Moreover using 
the notion of to show insecurity of candidate practical hashes seems diffi- 
cult: arguing about the success of an attacker under their definition involves, in 
particular, showing that it is impossible to extract a pre-image when someone 
produces an image. In contrast, security as defined by our notion is easier to 
refute. For example, the hash functions from [Z] for which flipping a bit in the 
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pre-image results in flipping a bit in the image are clearly insecure under our 
definition. 

The work by Pandey et al. defines adaptive one-way function families 
where inversion for an image under some key is still infeasible, even if one is 
allowed to obtain preimages under different keys. This notion is also related to 
non-malleability and turns out to be useful to design non-malleable protocols 
like commitments and zero-knowledge proofs. Unfortunately, this strong notion 
is not known to be realizable. 

It is noteworthy that, analogously to our work here, both papers choose the 
Bellare-Rogaway encryption function as an important test case, and succeed 
in instantiating the second random oracle of the scheme. Together with the 
notion that we develop in this paper, these give three different alternatives for 
the requirements needed for this instantiation. Those works also show that the 
first random oracle could be instantiated in the standard model with a function 
which in addition to the notions they define is also pseudorandom. Unfortunately, 
no construction from standard assumptions that meets either one of the two 
resulting notions is known. In contrast, our single-oracle instantiation through a 
non-malleable hash function is possible under standard assumptions. 

The work by Canetti and Varia independently considers the notion of 
verifiable non-malleable obfuscation where an adversary, given an obfuscated 
circuit, tries to produce an (obfuscated) circuit which is functionally related. 
The adversary’s success is measured against the success of a simulator given 
only an oracle implementing the original circuit functionality. Their notion of 
verifiable non-malleable obfuscators comes closest to our notion of non-malleable 
hash functions, and their construction for achieving a weaker notion of verifiable 
non-malleable obfuscation resembles our feasibility construction closely. 

The two notions are, nonetheless, different in spirit. For obfuscators the adver- 
sary’s task is to find something functionally related, whereas for non-malleable 
hash functions the adversary’s task is to find a hash of a related pre-image, thus 
capturing relations about specific values like relations among the bits. There are 
further technical differences like the fact that the (achievable) notion of weakly 
verifiable non-malleable obfuscators does not support auxiliary information —as 
required for our encryption case, for example— making the two notions incom- 
parable. More details are given in Section B] 


2 Preliminaries 


Definition 1 (Hash Functions). A hash function H = (HK,H,HVf) consists 
of PPTAs for key generation, evaluation and verification, where 


— PPTA HK for security parameter 1" outputs a key K (which contains 1* and 


implicitly defines a domain Dx ), 
— PPTA H for inputs K and x € Dx returns a value y € {0,1}*, 
— PTA HVf on inputs K,x,y returns a decision bit. 


It is required that for any K È HK(1"), any x € De, any y È H(K,2), algo- 
rithm HVf(K,2,y) outputs 1. 
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Note that we consider a very general syntax, comprising the “classical” notions 
of one-way functions (with a public key) and of collision-resistant hash functions 
which compress the input to a shorter digest (see for definitions). In our 
case the evaluation algorithm H may be probabilistic, as long the correctness 
of hash values is verifiable given the pre-image only (via HVf). Also, we do not 
demand the length of the output of the hash function to be smaller than that of 
the input. However, while we capture a large class of primitives, the generalized 
syntax may not preserve all properties of the special cases, e.g., if the evaluation 
algorithm is probabilistic, two independent parties hashing the same input will 
not necessarily get the same value. 

We now recall the definitions of one-wayness and collision resistance. For one- 
wayness the definition that we give is more general than the standard one in that 
it considers specific input distributions æ for the function, and also accounts for 
the possibility that the adversary may have some partial information about the 
pre-image (modeled through a probabilistic function hint): 


Definition 2 (One-wayness and Collision-resistance). A hash function 
H = (HK,H, HVf) is called 


— one-way (wrt X and hint) if for any PPTA A the probability that for K È 
HK(1*), 2 È ¥(1*), hs Š hint(K,2), y S H(K, x) and x* & A(K,y, hz) 
we have HVf(K,a*,y) = 1, is negligible. 

— collision-resistant if for any PPTA A the probability for K È HK(1*), 
(a, a’,y) © A(K) that x £ x’ but HVf(K,2,y) = 1 and HVf(K,2’,y) = 1, 
is negligible. 


3 Non-malleability of Hash and One-Way Functions 


Our definition for hash functions follows the classical (simulation-based) ap- 
proach for defining non-malleability [[7]. Informally, our definition requires that 
for any adversary which, on input a hash value y, finds another value y* such 
that the pre-images are related, there exists a simulator which does just as well 
without ever seeing y. 

In the adversary’s attack we consider a three-stage process. The adversary 
first selects a distribution ¥ from which a secret input x is then sampled (and 
passes on some state information). In the second stage the algorithm sees a 
hash value y of this input x, and the adversary’s goal is to create another hash 
value y* (usually different from y). In the third stage the adversary is given x 
and now has to output a pre-image 2* to y* which is “related” to x (we make 
the definition stronger by giving the challenge pre-image to the adversary). The 
simulator may also pick a distribution ¥ according to which x is sampled, but 
then it needs to specify «* directly from the key of the hash function only. 

In the second stage the adversary (and consequently the simulator) also gets 
as input a “hint” h, about the original pre-image x, to represent some a-priori 
information potentially gathered from other executions of other protocols in 
which g is used. In fact, such side information is often crucial for the deployment 


530 A. Boldyreva et al. 


in applications, e.g., for the encryption example in Section E] As in the case of 
non-malleable commitments and encryption, related pre-images are defined via 
a relation R(x, x*). This relation may also depend on the distribution ¥ to catch 
significantly diverging choices of the adversary and the simulator and to possibly 
restrict the choices for V, say, to require a certain min-entropy. However, unlike 
for other primitives, we do not measure the success of the adversary and the 
simulator for arbitrary relations R between x and «*, but instead restrict the 
relations to a class R of admissible relations. We discuss this and other subtleties 
after the definition: 


Definition 3 (NM-Hash). A hash function H = (HK,H,HVf) is called non- 
malleable (with respect to probabilistic function hint and relation class RE if for 
any PPTA A = (Aa, Ay, Ax) there exists a PPTA S = (Sa, Sx) such that for 
every relation RER the difference 


Pr | Expy 1k) = 1] —Pr | Expy %k) =1 is negligible, where: 


Experiment Expy *(k) Experiment Expe °(k) 
K Š HK(1*) K È HK(1*) 
cr sta) — Aa(K J // for state stal (X, sta) È Sa(K) 
z Š X(1'), hy S hint(K, x) x X(1*), he È hint(K, 2) 
y= kog 
Ce sty) Š A y (Ys he, sta) ; 
a* Č Az(T, sty) xu* — Sa (hz, sta) 
Return 1 iff Return 1 iff 
R(X, x, 2*) R(X,x,2*) 
A (x,y) # (2*,y") 


A HVE(K, x “y) = =1 


REMARK 1. Our definition is parameterized by a class of relations R. This is 
because for some relations the definition is simply not achievable, as in the case 
when the relation involves the hash of x instead of x itself. More specifically, 
consider the relation R(x, «*) which parses x* as K,y and outputs HVf(K, x,y). 
Then, an adversary on input y,h,,stq may output y* — H(K, (K,y)) and then, 
given x, returns 7* = (K, y). This adversary succeeds in experiment Expy 1(k) 
with probability 1. In contrast, any simulator is likely to fail, as long as the 
hash function does not have “weak” keys, i.e., keys for which the distribution of 
generated images is ttrivial (such that the simulator can guess y with sufficiently 
high probability). 

We resolve this problem by requiring the definition to hold for a subset R of 
all relations. It is of course desirable to seek secure constructions with respect 


* Throughout the paper all hint functions and relations are assumed to be efficient. We 
furthermore assume that the security parameter is given in unary to all algorithms 
as additional input (if not mentioned explicitly). 
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to very broad classes of relations (cf. our construction in Section J) which are 
more handy for general deployment. At the same time, certain scenarios may 
only require non-malleability with respect to a small set of relations (cf. the 
application example discussed in Section). Our definition is general and permits 
easy tuning for the needs of a particular application or a class of applications. 


REMARK 2. For virtually all “interesting” functions H and relation classes R the 
definition is achievable only for adversaries and simulators that output descrip- 
tions of well-spread distributions Æ (i.e., with super-logarithmic min-entropy). 
For the construction in next section we also require hint to be a so-called unin- 
vertible function (for which finding the exact pre-image is infeasible). Note 
that uninvertibility is a weaker requirement than one-wayness, as it holds for 
example for constant functions. We prefer to keep the definition as general as 
possible, so we do not explicitly impose such restrictions on the adversary, sim- 
ulator, and hint. 


REMARK 3. In our definition we demand that the simulator outputs x* given 
K and hz only. A weaker condition would be to have a simulator S,(hz, sta) 
first output y*, like the adversary Ay, and then 2* — S,(a,st,), before checking 
that R(¥V,x,x*) and that HVf(K,2*,y*) = 1. Since in this case the simulator 
in the second stage is also given x we call this a weak simulator and hash func- 
tions achieving this notion weakly non-malleable. This distinction resembles the 
notions of non-malleable commitments with respect to commitment and with re- 
spect to opening [620]. Depending on the application scenario of non-malleable 
hash functions the stronger or weaker version might be required. As an exam- 
ple, the result about the Bellare-Rogaway encryption scheme uses the stronger 
definition above, and our construction in the next section achieves this stronger 
notion, which obviously implies the weaker one. 


REMARK 4. Similarly to the previous variation one can let the adversary only 
output a hash value y*, and omit the step where it later also has to give x”. 
The simulator’s task, too, is then to only output a hash value. Then one defines 
meaningful relations through existential quantifications (“...if there exists a pre- 
image x* such that R(«,2*) holds”). This is essentially the approach taken by 
Canetti and Varia [I3] for (weakly) verifiable non-malleable obfuscators. 

On the one hand the “hash-only” approach above facilitates the adversary’s 
task if it does not need to know a specific pre-image. On the other hand, it also 
simplifies the simulator’s task. As an example the adversary in our definition may 
decide upon a specific x* satisfying the relation, after seeing x. Security against 
such an attack cannot be captured by the above notion of relaxed simulators, 
whereas the simulator in our defintion also needs to find an appropriate x*. 
This particular example demonstrates that our approach and the definition for 
(weakly) verifiable non-malleable obfuscators in are incomparable. Further 
differences between the notions are the lack of auxiliary information and the 
dependency of the simulator on the relation in the definition of Canetti and 
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Varia [13]. In addition, the feasilibility results presented later in our paper and 
the solutions in are for incomparable classes of relations. 


REMARK 5. Note that we only demand that (x, y) 4 (x*,y*) for the adversary’s 
choice (instead of demanding x # «* or y Æ y* instead), yielding a stronger 
definition, especially when the randomized hash function has multiple images 
for some input. Again, the particular need depends on the application and our 
solution meets this stronger requirement. 


REMARK 6. In the case of non-malleable encryption the original simulation-based 
definition of [I7] was later shown to be equivalent to an indistinguishability-based 
definition [B]. The superficial similarity between our definition of non-malleable 
hash functions and the one of non-malleable encryption suggests that this may 
be possible here as well. Surprisingly, straightforward attempts to define non- 
malleability of hash functions through indistinguishability do not seem to yield 
an equivalent definition. We discuss this issue in the full version [6] in more detail 
(because of lack of space), and leave it as an interesting open problem to find a 
suitable indistinguishability-based definition for non-malleable hash functions. 


REMARK 7. The usual security notions for hash functions include one-wayness 
and collision-resistance. However, neither property is known to follow from Def- 
inition B] Consider a constant function H which is clearly not one-way nor 
collision-resistant. But the function is weakly non-malleable as a simulator can 
simulate A in a black-box way by handing the adversary the constant value. We 
keep these rather orthogonal security properties separate, as some applications 
may require one but not the others. 


REMARK 8. Some applications (like the HMAC example in [[9J) require a multi- 
valued version of the definition in which the adversary can adaptively generate 
several distributions and receive the images (with side information) before de- 
ciding upon y*. One can easily extend our definition accordingly, letting Ag loop 
several times, in each round 7 generating a distribution ¥; and receiving y; and 
hz, at the beginning of the next round and before outputting an image y*. In 
general, it is possible to extend our construction to this case using stronger, 
adaptive versions of POWHFs and NIZKPoks. See Remark 1 after Theorem [] 


4 Constructing Non-malleable Hash Functions 


In this section we give feasibility results via constructions for non-malleable hash 
functions. The main ingredient of our constructions is a perfectly one-way hash 
function (POWHF) [DIJ], which hides all information about the pre-image but 
which may still be malleable [7]. To ensure non-malleability we tag the hash value 
with a simulation-sound non-interactive zero-knowledge proof of knowledge of 
the pre-image. We first recall the definitions of these two primitives. 

For POWHFs we slightly adapt the definition from to our setting. Orig- 
inally, POWHFs have been defined to have a specific input distribution ¥ (like 
the uniform distribution in [2M8] ). Here we let the adversary choose the input 
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distribution adaptively, and merely demand that this distribution Æ satisfies 
a certain efficient predicate P ow(®); this is analogous to the non-malleability 
experiment in which the adversary chooses ¥ and the relation R takes ¥ as 
additional input. We call the side information here aux (as opposed to hint for 
non-malleability) in order to distinguish between the two primitives. In fact, in 
our construction aux uses hint as a subroutine but generates additional output. 


Definition 4 (POWHF). A hash function P = (POWK, POW, POWVf) is 
called a perfectly one-way hash function (with respect to predicate Pow and proba- 
bilistic function aux) if it is collision resistant, and if for any PPTA B = (Ba, Bo), 
where B, has binary output, the following random variables are computationally 
indistinguishable: 


K & POWK(1*); x È (1%) K Š POWK(1*) 
dy Č aux(K, £); y È POW(K, x)| (X, sta) = Ba(K) 
b= Byly, ax, sta) ae X(1*), z! È X(1*) 
return (K, 2,6) if Poow(X) = 1 dy, È aux(K, x) ; y! È POW(K, 2’) 
else L b È Bily’, az, sta) 
return (K, x,b) if Poow(X) = 1 
else L 


REMARK 1. As pointed out in the definition only makes sense if aux is an 
uninvertible function of the input (such that finding the pre-image x from a, is 
infeasible) and B, only outputs descriptions of well-spread distributions (with 
super-logarithmic min-entropy). Otherwise the notion is impossible to achieve. 
For generality, we do not restrict ¥ and aux explicitly here. 


REMARK 2. Perfectly one-way hash functions (in the sense above) can be con- 
structed from any one-way permutation (for the uniform input distribu- 
tion), any regular collision-resistant hash function (for any distribution with 
fixed, super-logarithmic min-entropy), or under the decisional Diffie-Hellman 
assumption (for the uniform distribution). Usually these general construc- 
tions are not known to be secure assuming arbirtrary functions aux, yet for the 
particular function aux required by the application they can often be adapted 
accordingly. A concrete example is given in Section [6] in our discussion of the 
Bellare-Rogaway encryption scheme. 


ON THE CHOICE OF THE RELATION CLASS. Recall that the definition of non- 
malleability is parametrized by a class of relations. As explained earlier in the 
paper, no non-malleable hash function for an arbitrary class exists (see Remark 1 
after Definition B). In the sequel, we exhibit a class of relations for which we show 
how to construct non-malleable hash functions, and then present our provably 
secure construction. 

Specifically, we consider the class of relations eae parameterized by an 
optional function rinfo and which consists of all relations of the form R(x, x*) = 
P(x, P*(rinfo(x),x*)), for all efficient predicates P, P* [] The function rinfo(x) 


5 Where we neglect the distribution ¥ as part of the relation’s input for the moment. 
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may be empty or consist of a small fraction of bits of x (e.g., up to logarithmically 
many), and should be interpreted as the information about x that may be used 
in evaluating the relation R. It is important that rinfo is an univertible function, 
as otherwise, if one could recover x from rinfo(x), then RG would comprise 
all efficient relations, R(a,a*) = P*(x,x*), and non-malleability with respect to 
this class, again, would not be achievable. 

As an example consider the empty function rinfo such that Rprea consists of 
all relations R(x, x*) = P(x, P*(x*)). This class of relations allows to check for 
instance that individual bits of x and x* are complement of each other, i.e., if 7; 
denotes the projection onto the j-th bit then one sets P*(x*) = 7;(x*) and lets 
P(x, P*(x*)) output 1 if 1;(x) # m;(x*). This example has also been used by 
Boldyreva and Fischlin [f] to show the necessity of non-malleability for OAEP, 
and to give an example of a perfectly one-way hash function that is malleable in 
the sense that flipping the first bit of an image produces a hash of the pre-image 
whose first bit is also flipped. 

In the examples above rinfo has been the empty function. Of course, using 
non-trivial functions rinfo allows for additional relations and enriches the class 
RON, Consider for example a hash function H that is malleable in the sense that 
an adversary, given H(K,r||m) for random r € {0,1}*, can compute H(K,r||m’) 
for some m’ Æ m. One way to capture that the two pre-images coincide on the 
first k bits is to set rinfo(r||m) = r and to set P*(r,x*) = 1 if and only if r 
is the prefix of x*. Since rinfo should be univertible, the function should rather 
return only a fraction of r, though. Similarly, one can see that the class Rone 
“captures” relations like R(x, x*) = 1 iff x @ x* = 6 for some constant ô, and 
many other useful relations. 

Finally, we note that each relation from the class also checks that the chosen 
input distribution VY “complies” with the eligible distributions from the under- 
lying POWHF. That is, each relation also checks that the predicate Phow(%) 
of the POWHF is satisfied. The full relation R(, x, x*) then evaluates to 1 iff 
P(x, P*(rinfo(az),x*)) = 1 and Pyow(4’) = 1. More formally, for any predicate 
P ow and uninvertible function rinfo we define the class of relations: 


Rio Prou _ there exist efficient (probabilistic) predicates P, P* 
pred o ` such that RUA x, x*) = P(x, P*(rinfo(x), £*)) A Poow(®) 


Our construction also uses a simulation-sound zero-knowledge proof of knowledge 
IT = (CRS, P, V) for the NP-relation Rpow defined by: 


Rpow = {(Kpowl|Ypow, ||") : POW(K pow, £; r)= Ypow} - 


which essentially says that one “knows” a pre-image of a hash value. Simulation- 
sound NIZK proofs of knowledge for such relations can be derived from trapdoor 
permutations BIMA]. We recall the definition of the former in the full version. 


THE CONSTRUCTION AND ITS SECURITY. The following theorem captures the 
security of our construction. 


Foundations of Non-malleable Hash and One-Way Functions 535 


Theorem 1. Let P = (POWK, POW, POWVf) be a perfectly one-way hash func- 
tion with respect to Ppow and aux, where aux = (hint, rinfo) for probabilistic func- 
tions hint and rinfo. Let IT = (CRS,P,V) be a simulation-sound non-interactive 
zero-knowledge proof of knowledge for relation Rpow. Then the following hash 


function H = (HK, H, HVf) is non-malleable with respect to hint and icine ae : 


— PPTA HK on input 1* samples K pow < POWK(1*) and crs È CRS(1*) and 
outputs K = (Kpow, crs). The associated domain Dx is given by DK ow: 

— PPTA H on input K and x € Dg computes Ypow — POW(K pow, £; r) for 
random r È RNDxK 


Y = (Ypow, T). 

— PTA HVE for inputs K = (Kpow,crs), © and y = (Ypow, T) outputs 1 if and 
only if 
POWVf(K pow, £, Ypow) = 1 and V(crs, Kpowl|Ypow, T) = 1. 


as well as n & P(crs, Kpow||Ypow, 2||r). Tt outputs 


In addition, H is collision-resistant. 


Due to space limitations we provide the detailed proof in the full version of the 
paper [6]. 


REMARK 1. The malleability adversary has access to essentially two different 
sources of partial information about «x: hint(x) which it receives explicitly as 
input, and rinfo(x) which it can use indirectly through the relation R. This 
motivates the requirement that P be perfectly one-way with respect to partial 
information aux = (hint, rinfo). 


REMARK 2. As mentioned after the definition of non-malleable hash functions, 
some applications (like the one about HMAC [19]) may require a stronger no- 
tion in which the adversary can adaptively generate distributions and receives 
the images, before deciding upon y*. Our construction above can be extended 
to this case, assuming that the POWHF obeys a corresponding “adaptiveness” 
property and that the zero-knowledge proof of knowledge is multiple simulation- 
sound and multiple zero-knowledge. Such adaptively-secure POWHFs (for uni- 
form distributions) can be built from one-way permutations [I8] and suitable 
zero-knowledge proofs exist, assuming trapdoor permutations BMA. 


5 On the Complexity of Non-malleable Functions 


In this section we discuss the existential complexity of non-malleable functions. 
We first indicate, via an oracle separation result, that deriving non-malleable 
hash and one-way functions via one-way permutations is infeasible. In the full 
version [6] we also discuss the relation between non-malleability and one-wayness. 


5.1 On the Impossibility of Black-Box Reductions 


We first show that, under reasonable conditions, there is no black-box reduction 
from non-malleable hash functions (which might not even be collision-resistant 
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but rather one-way only) to one-way permutations. For space reasons most of 
the proofs have been moved to the full version of the paper [6]. 


BLACK-Box REDUCTIONS. In their seminal paper Impagliazzo and Rudich [24] 
have shown that some cryptographic primitives cannot be derived from other 
primitives, at least if the starting primitive is treated as a black box. Instead of 
separating primitives as in here we follow the more accessible approach of 
Hsiao and Reyzin [23], giving a relaxed separation result with respect to black- 
box security reductions. We give a formalization of the oracle-based black-box 
separation approach that we use in the full version. 

For our result we assume that the algorithms of the hash function H are 
granted oracle access to a random permutation oracle P (which is one-way, of 
course). A black-box reduction to P is now an algorithm which, with oracle ac- 
cess to P and a putative successful attacker A on the non-malleability property, 
inverts P with noticeable probability. Such an attacker A may take advantage of 
another oracle Ø (related to P) which allows it to break the non-malleability but 
does not help to invert the one-way permutation P. Since neither the construc- 
tion nor the reduction are given access to O, the reduction must be genuinely 
black-box. 


DEFINING ORACLES P AND O. For now we let P be a random permutation 
oracle which in particular is a one-way function. Below we show through de- 
randomization techniques that some fixed P must also work. For our separa- 
tion we let the side information of the non-malleable hash function include 
an image of the uniformly distributed input x under P. More precisely, con- 
sider the function hint, which on input (1%, K,x) for random x computes 
hz = P(0*|\z|| (HVF) ||) for the description (HVf) of the verification algorithm 
and finally outputs hz 

We next construct the oracle O that helps to break non-malleability. The 
idea is that using O it is possible to extract from the image y and “hint” hz, 
(described above) the pre-image x of y. Since the adversary gets y as input, but 
the simulator does not, the oracle is only helpful to the adversary. Note that 
breaking non-malleability means that no simulator of comparable complexity is 
able to approximate the success probability of AP’? closely. To ensure that the 
simulator has the equal power as A”? we grant the simulator S’° therefore 
access to both oracles P, O. 


Construction 1. Let oracle O take as input a parameter 1", an image y and 
a “hint” hy. The oracle first finds the pre-image z||x|| (HVF) ||K of ha under P 
and verifies that z = 0"; if not it immediately returns 1. Else it checks that 
HVF? (K, x,y) =1 and returns x if so (and outputs L otherwise). 


6 We note that the side information h, does not reveal any essential information about 
x in the sense that one can show that, for any non-malleable hash function for the 
uniform input distribution and no side information at all, the hash function remains 
non-malleable with respect to hz relative to the random permutation P (but not 
relative to O, of course). Also observe that the common strategy of using black-box 
simulators usually works for any side information, and in particular for the one here. 
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We show that O does not help to invert P, thus showing that relative to the 
oracles there still exists one-way permutations: 


Proposition 1. For any efficient algorithm B°” , the probability that BP © breaks 
the one-wayness of P is negligible. 


In light of this lemma we conclude that there exists a particular P that is hard 
to invert for all PPT adversaries with oracles P, O. The argument is the same as 
in B3]. For a fixed PPT adversary B, we define the sequence of events (indexed 
by k) where B inverts strings of length k with some good probability; for a 
suitable choice of parameters, the sum of the probabilities (over P) of these 
events converges and by the first Borel-Cantelli lemma only finitely many of 
these events may occur, almost surely. Then taking the countable intersection 
over all PPT $, we get that there is at least one P with the desired property. 


SEPARATION. We require some mild, technical conditions for our non-malleable 
hash function and the relation. Namely, we assume that 


— the hash function is non-trivial meaning that it is infeasible to predict an 
image for uniformly distributed input over {0,1}* (thus ruling out trivial 
examples like constant hash functions), and 

— the relation class R contains the relation Rsep which on input (1,2, 2*) 
checks that Æ is the uniform distribution on {0,1}*, and that parity(x) = 
Q z; = parity(xz*) = @a;. Note that Rsep E Rprea for our predicate-based 
relations, even for the empty function rinfo, and can thus be achieved in 
principle. 


Theorem 2. Let H? = (HK”, HP, HVf?) be a non-trivial non-malleable hash 
function with respect to hintZ., and R > Rsep. Then there exists an adversary 


AP»? that breaks non-malleability of HP (for any simulator SPO). 


Corollary 1. There exists no black-box reduction from non-trivial non-malleable 


functions (with respect to hint and R > Rsep) to one-way permutations. 


At first glance it seems as if our result would transfer (after some minor mod- 
ifications) to other non-malleable primitives like commitments. This is not the 
case. The oracle O in our construction relies on the ability to check whether a 
pre-image x matches an image y (public verifiability of hash functions), while 
other primitives such as encryption E(m;r) and commitments Com(m; r) use 
hidden randomness (which is not part of the input of function hint). 


RELATING NON-MALLEABILITY AND PERFECT ONE-WAYNESS. In the full ver- 
sion we show that non-malleability implies a variant of perfect-one-wayness. 


6 Applications 


In this section we study the usefulness of our notion for cryptographic applica- 
tions. As an example we show that when one of the two random oracles in the 
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aforementioned encryption scheme proposed by Bellare and Rogaway in [4] is 
instantiated with a non-malleable hash function, the scheme remains IND-CCA 
secure. In addition, we argue that non-malleability is useful in preventing off-line 
computation attacks against a certain class of cryptographic puzzles. 


INSTANTIATING RANDOM ORACLES. We start with recalling the scheme. Let F 
be a familiy of trapdoor permutations and G, H be random oracles. The message 
space of the scheme BR@'” [F] = (K, E, D) is the range of G. The key generation 
algorithm K outputs a random F-instance f and its inverse f~! as the public 
and secret key, respectively. The encryption algorithm € on inputs f and m 
picks random r in the domain of f (we assume that r € {0,1}*) and outputs 
(f(r), G(r) @ m, H(r||m)). The decryption algorithm on inputs f~} and (y, g, h) 
first computes r — f~'(y), then m — g @ G(r), and outputs m iff H(r||m) = h. 
The scheme BR” [F] is proven to be IND-CCA secure in the random oracle 
model assuming that F is one-way W]. 

Here we study the possibility of realizing the random oracle with an actual 
hash function family H = (HK,H,HVf), a so-called partial H-instantiation of 
the scheme. More precisely, we modify the scheme so that the public key and 
secret key also contain a key K © HK(1") specifying a function. Then € com- 
putes H(K,r||m) instead of H(r||m), and D computes HVf(K,1||m, h) instead of 
checking that H(r||m) = h. We refer to this scheme as BR@’[F]. The following 
shows that functions that meet our notion of non-malleability are sufficient for 
a secure partial H-instantiation. 

Before stating the sufficient conditions for security to hold, we fix some nota- 
tion. Below we let the function rinfogr(#) = msbz/2(x) output the k/2 most 
significant bits of its input. The class of relations we require here for non- 
malleability is only a subset of the achievable class discussed in Section J] 
Namely, we only require a relation of the form Rgr(&, x, s*) = P*(rinfogr(x), 2*) 
APoow(X), where P ow is the predicate that checks that æ is the canonical rep- 
resentation of the uniform distribution on the first k bits, and P* is the pred- 
icate that simply verifies that msb,/2(a*) = rinfogr(«x). We choose this specific 
predicate Rgr so that it can check if x = x*, while erring with only negligible 
probability, but still admit the construction of non-malleable hash functions. 

Below we will require that the trapdoor permutation family is msb;,/2-partial 
one-way, meaning that it is hard to compute the k/2 most significant bits of 
the random input r given a random instance f and f(r) (cf. ZI] for the formal 
definition). This is a rather mild assumption to impose on F. For example, 
RSA was shown to be partial one-way under the RSA assumption in BI. A 
general approach to construct such a partial one-way family F is to define f(r) = 
g(msbz/2(r))||g(sb;2(7)) for a trapdoor permutation g 


7 In fact, this construction also has the useful property that f (r) is still hard to invert, 
even if given msb;/2(r). Thus this trapdoor permutation is suitable for constructing 
POWHFs secure with respect to side information (msb,; /2(r), f(7)) and therefore, via 
our construction, non-malleable hash functions for side information hintgr(r) = f(r) 
and the relation Rgr. In other words, non-malleable hash functions for hintgr and 
Rpr exist under common cryptographic assumptions. 
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We need one more technical detail before stating the theorem. We start with 
some hash function family H = (HK, H, HVf) and trapdoor permutation family 
F. We write H = (HKz,H,HVf) for the modified hash function for which key 
generation outputs a random instance of F along with the original hash key. 
Below we write hintgr for the function that takes as input a key (K,f) and 
string x, and outputs f(r), where r are the first k bits of the input æ. We note 
the IND-CPA version of the scheme by Bellare and Rogaway was shown secure 
in the standard model by Canetti RD], assuming the hash function is a POWHF 
with respect to a similar hint function. 


Theorem 3. Let F be an msbx/2-partial one-way trapdoor permutation family 
and let H = (HKz,H,HVf) be a collision-resistant hash function which is non- 
malleable with respect to the function hintgr and to the relation Rgr. Assume 
further that H is a perfectly one-way hash function with respect to Ppow and 
hintgr. Then BRO™|F] is IND-CCA secure (in the RO model). 


REMARK. Although the non-malleability property of the hash implies that no 
partial information about pre-images is leaked (cf. the full version for a formal 
statement of this implication), the theorem above requires the hash to be per- 
fectly one-way in the sense of Definition M which is a stronger requirement in 
general. The proof of the theorem is in the full version [6]. 


APPLICATION TO CRYPTOGRAPHIC PUZZLES. Cryptographic puzzles are a de- 
fense mechanism against denial of service attacks (DoS). The idea is that, before 
spending any resources for the execution of a session between a client and a 
server, the server requires the client to solve a puzzle. Since solving puzzles re- 
quires spending cycles, the use of puzzles prevents a malicious client to engage 
in a large number of sessions without spending itself a significant amount of 
resources. One desirable condition is that the server does not store any client- 
related state. 

A simple construction for such puzzles proposed by Juels and Brainard 
is based on any arbitrary one-way function h : {0,1}! — {0,1}!. First, select at 
random x Š {0,1}! and compute y = h(x). Then, a puzzle is given by the tuple 
(a[1..l — k], y) consisting of the first l — k bits of x together with y. To prove 
it solved the puzzle, the client has to return (x,y). It can be easily seen that 
the construction above is not entirely satisfactory. In particular, it either fails 
against replay attacks —where the clients present the same puzzle-solution pair 
to the server— or the server needs to store all of the x’s used to compute the 
puzzles. 

The solution proposed to mitigate the above problem is to compute x as 
H(S,t), where S is some large bitstring known only to the server, and t is some 
bitstring that somehow “expires” after a certain amount of time (this can be for 
example the current system time). The puzzle is then given by (t, z[1..1 — k], y), 
where y = h(x). A solution (or solved puzzle) is (t, x,y) which needs to satisfy 
the obvious equations, and moreover, t is not an expired bitstring. 

In the setting above, non-malleability of H surfaces as an important property. 
If out of the first two elements (t, H(S,t)) of a puzzle solution the adversary can 
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efficiently construct (t’, H(S,t’)) for t # t, a string which has not yet expired, 
then the defense sketched above is rendered useless: the adversary can easily 
construct new puzzles (together with their solutions). Requiring that the func- 
tion H is non-malleable with respect to the relation R(s1, s2) = 1 iff sı = (S, t) 
and s2 = (S,t’) for t At’ is sufficient to prevent the above attack. 
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Abstract. The hash function Skein is the submission of Ferguson et 
al. to the NIST Hash Competition, and is arguably a serious candi- 
date for selection as SHA-3. This paper presents the first third-party 
analysis of Skein, with an extensive study of its main component: the 
block cipher Threefish. We notably investigate near collisions, distin- 
guishers, impossible differentials, key recovery using related-key differ- 
ential and boomerang attacks. In particular, we present near collisions 
on up to 17 rounds, an impossible differential on 21 rounds, a related-key 
boomerang distinguisher on 34 rounds, a known-related-key boomerang 
distinguisher on 35 rounds, and key recovery attacks on up to 32 rounds, 
out of 72 in total for Threefish-512. None of our attacks directly extends 
to the full Skein hash. However, the pseudorandomness of Threefish is re- 
quired to validate the security proofs on Skein, and our results conclude 
that at least 36 rounds of Threefish seem required for optimal security 
guarantees. 


1 Introduction 


The hash function research scene has seen a surge of works since devastating 
attacks DEAIBIH on the two most deployed hash functions, MD5 and SHA-1. 
This led to a lack of confidence in the current U.S. (and de facto worldwide) 
hash standard, SHA-2 [5], because of its similarity with MD5 and SHA-1. 

As aresponse to the potential risks of using SHA-2, the U.S. National Institute 
of Standards and Technology (NIST) launched a public competition—the NIST 
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Hash Competition—to select a new hash standard [6]. The new hash function, 
SHA-3, is expected to have at least the security of SHA-2, and to achieve this 
with significantly improved efficiency. By the deadline of October 2008, NIST 
received 64 submissions, 51 were accepted as first round candidates, and in July 
2009 14 were selected as second round candidates, including Skein. Due to the 
critical role of hash functions in security protocols, this competition catches the 
attention not only from academia, but also from industry—with candidates from 
IBM, Hitachi, Intel, Sony—and from governmental organizations. 

Skein [7] is the submission of Ferguson et al. to the NIST Hash Competition. 
According to its designers, it combines “speed, security, simplicity and a great 
deal of flexibility in a modular package that is easy to analyze” [Z] p.i]. Skein 
supports three different internal state sizes (256-, 512-, and 1024-bit), and is one 
of the fastest contestants on 64-bit machines. 

Skein is based on the “UBI (The Unique Block Iteration) chaining mode” that 
itself uses a compression function made out of the Threefish-512 block cipher. 
Below we give a brief top-down description of these components: 


e Skein makes three invocations to the UBI mode with different tags: the 
first hashes the configuration block with a tag “Cfg”, the second hashes 
the message with a tag “Msg”, and the third hashes a null value with a 
tag “Out”. 

e UBI mode hashes an arbitrary-length string by iterating invocations to 
a compression function, which takes as input a chaining value, a message 
block, and a tweak. The tweak encodes the number of bytes processed 
so far, and special flags for the first and the last block. 

e The compression function inside the UBI mode is the Threefish-512 
block cipher in MMO (Matyas-Meyer-Oseas) mode, i.e., from a chaining 
value h, a message block m, and a tweak t it returns Ep(t, m) @ m as 
new chaining value. 

e Threefish is a family of tweakable block ciphers based on a simple per- 
mutation of two 64-bit words: MIX (z, y) = (x + y, (x + y) ® (y & R)). 
Threefish-512 is the version of Threefish with 512-bit key and 512-bit 
blocks, and is used in the default version of Skein. 


So far, no third-party cryptanalysis of Skein has been published, and the only 
cryptanalytic results are in its documentation [f] §9]. It describes a near collision 
on eight rounds for the compression function, a distinguisher for 17 rounds of 
Threefish, and it conjectures the existence of key recovery attacks on 24 to 27 
rounds (depending on the internal state size). Furthermore, [M] §9] discusses the 
possibility of a trivial related-key boomerang attack on a modified Threefish, and 
concludes that it cannot work on the original version. A separate document 
presents proofs of security for Skein when assuming that some of its components 
behave ideally (e.g., that Threefish is an ideal cipher). 

This paper presents the first external analysis of Skein, with a focus on the 
main component of its default version: the block cipher Threefish-512. Table [] 
summarizes our results. 
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Table 1. Summary of the known results on Threefish-512 (near collisions are for 
Threefish-512 in MMO mode, related-key boomerang attacks make use of four related- 
keys ,“\/” designates the present paper) 


Rounds Time Memory Type Authors 
8 1 - 511-bit near-collision i | 
16 9° - 459-bit near-collision y 
17 a -= 434-bit near-collision J 
17 226 - related-key distinguisher* ra! 
21 3-4 -— related-key distinguisher J 
21 7 - related-key impossible differential J 
25 ? = related-key key recovery (conjectured) a 
25 On ee = related-key key recovery J 
26 gorg = related-key key recovery J 
32 gale ft related-key boomerang key recovery J 
34 898 = related-key boomerang distinguisher J 
35 gera -= known-related-key boomerang distinguisher J 


x: complexity deduced from the biases in [7] Tab.22]. 


The rest of the paper is organized as follows: 2] describes Threefish-512; 3] 
studies near-collisions for Skein’s compression function with a reduced Threefish- 
512; $] describes impossible differentials; discusses and improves the key- 
recovery attacks sketched in [7] §89.3]. Finally, luses the boomerang technique 
to describe our best distinguishers and key-recovery attacks on Threefish. A 
concludes. 


2 Brief Description of Threefish-512 


Threefish-512 works on 64-bit words, and we write their hexadecimal value in 
sans-serif font (e.g., 0123456789ABCDEF). The letter A stands for a difference in 
the most significant bit (MSB), i.e., A = 8000000000000000. Notations are the 
same as in the specification of Threefish [ø] §§2.2]: a 512-bit plaintext block is 
parsed as eight words vo.0,...,Vo,7, and is encrypted through N, = 72 rounds, 
where round number d € {0,..., N, — 1} operates as follows: 


1. If d = 0 mod 4, add a subkey by setting eg; <— vai + kai, i = 0,...,7, 
otherwise, just copy the state eg; — vai, i= 0,...,7. 
23 Set (fa,2i, fa,2i41) = MIXa,i (€d 2i, €d,2i+1), i= 0, ayaa 135 where 


MIX, (2, y) = (x +y, (£ +y) (y & Raji) , 


with Ra, a rotation constant dependent on d and 2. 
3. Permute the state words: 


Vd+1,0 — fa UVa+1,1 — faa Va+1,2 — faa Va+1,3 — fa,7 


Ud+1,4 — fae Vatis — fas Va+1,6 — fao Vati,7 — fa - 
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After N, = 0 mod 4 rounds, the ciphertext is set to 


(uy,,0 + kn,,0), +++, (UN,,7 + ky,.,7) - 


The s-th keying (counting from zero, thus which occurs at round d = 4s) uses 
subkeys ks9,...,ks,7- These are derived from the key ko,...,k7 and from the 
tweak to, tı as 


kso — k(s+0) mod 5 ks,4 = k(s+4) mod 5 

ks = k(s+1) mod 5 ks,5 = k(s+5) mod 5 + ts mod 3 
ks,2 = k(s+2) mod 5 ks,6 k(s+6) mod 5 T E(s41) mod 3 
ks,3 = k(s+3) mod 5 ks,7 = k(s+7) mod5 T S 


where kg = 5555555555555555 © @)_, ki and tz = to @ ti. 


3 Near Collisions for the UBI Compression Function 


We extend the analysis presented in [Z] §9] to find near-collisions for the compres- 
sion function of Skein’s UBI mode; [Z] §9] exploits local collisions, i.e., collisions 
in the intermediate values of the state, which occur when particular differences 
are set in the key, the plaintext, and the tweak. 

The compression function outputs E;,(t,m) @m, where E is Threefish-512. 
Our strategy is simple: like in [7] 89], we prepend a four-round differential trail 
to the first local collision at round four so as to avoid differences until the 13-th 
round. Then, we follow the trail induced by the introduced difference. 

The next two sections work out the details as follows: 


e §{3-T] shows how to adapt the differential trail found in [7] §9] when a 
4-round trail is prepended. 

e §§3.2) describes the differential trails used and evaluates the probability 
that a random input conforms. 

e §{3:3Jexplains how to reduce the complexity of the attack by precomput- 
ing a single conforming pair for the first 4-round trail, and using some 
conditions to speed up the search. 


3.1 Adapting Differences in the Key and the Tweak 


In [7] §889.3.4], Skein’s designers suggest to prepend a 4-round trail that leads to 
the difference (0,0,...,0, A), previously used for the 8-round collision. However, 
the technique as it is presented does not work. This is because the order of 
keyings is then shifted, and so the original difference in the key and in the tweak 
does not cancel the (0,0,...,0, A) difference at the second keying. 

Therefore, for differences to vanish at the third keying, one needs a difference 
A in ky and to, which gives a difference (0,...,0, A) at the second keying, and 
(0,0,0,0, A,0,0) after the fourth. The difference in the state after (4+8) rounds 
is thus the same as originally after eight rounds. Note that, as observed in [7] 
889.4], at least seven keyings separate two vanishing keyings. See Table B] for 
details. 
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Table 2. Details of the subkeys and of their differences, given a difference A in k7 and 
to (leading to A differences in kg and t2) 


d ks,0 Ks ks,2 ks,3 ks,4 ks,5 ks.6 ks,7 

7 Differences 

0 0 ko ky k2 k3 k4 ks +to ke tti k7 
0 0 0 0 0 A 0 A 

1 4 kı ko k3 ka ks ke +t: kr+tz2 kgs+1 
0 0 0 0 0 0 0 A 

2 8 k2 k3 ka ks ke ky +to kg +to ko +2 
0 0 0 0 0 0 0 0 

3 12 k3 ka ks ke kz ks +to ko+tıi kh+3 
0 0 0 0 A 0 0 0 

4 16 ka ks ke kz kg ko +t: kitte ke+4 
0 0 0 A A 0 A 0 

5 20 ks ke kz kg ko ky + t2 k2 + to k3 +5 
0 0 A A 0 A 0 

6 24 ke k7 ks ko kı ko+to k3+tı k4+6 
(0) A A 0 0 A 0 0 


3.2 Differential Trails 


We now trace the difference when prepending four rounds, i.e., when the differ- 
ence is in ky and in to only (and in the plaintext). 


4-Round Trail. To prepend four rounds and reach the difference (0,...,0, A), 
one uses the trail provided in the full version of this paper. The plaintext 
difference is modified by the first keying (the MSB differences in the sixth and 
eighth word vanish). The probability that a random input successfully crosses 
the 4-round differential trail is 2733 (either forward or backward). 


12-Round Trail. The second keying adds A to the last state word, making its 
difference vanish. The state remains free of any difference up to the fourth keying, 
after the twelfth round, which sets a difference A in the fifth word state. Table B] 
presents the corresponding trail for up to the 17-th round. After 17 rounds, the 
weight becomes too large to obtain near collisions. On 16 rounds, adding the 
final keying and the feedforward, one obtains a collision on 512 — 53 = 459 bits. 
Likewise, for 17 rounds, a collision can be found on 512 — 78 = 434 bits. 


3.3 Optimizing the Search 


A direct application of the differential trails in the previous section gives a cost 
233 to cross the first four rounds; then, after the twelfth round, 
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Table 3. Differential trail (linearization) used for near collisions, of probability 2774 


Pr 


13 0000000000000000 0000000000000000 8000000000000000 0000000000000000 
0000000000000000 8000000000000000 0000000000000000 0000000000000000 


Rd | Difference 


8000000000000000 0000000000000000 8000000000000000 0000000000000000 


1 0000000000000000 8000010000000000 0000000000000000 8000000000000000 


J 


15 2 


8000000000000000 8008010000000400 8000000000000000 8000000000000000 


0000010000000100 0000000100000000 0008010000000400 0000000400000000 
0000000000000000 0004014004008400 0000000000000000 0804010000000100 


16 5 


bo 


1 
1 


17 8008010400000400 0000010100000140 800A014004008400 A805018020000100 9-18 


Henne ean 8000000000000000 8000010000000000 ete 
eee erent 900A016801009402 0000010100000100 o 


e With 16 rounds: complexity is 2'+° = 2°, so 23° in total, for finding a 
collision over 459 bits. 

e With 17 rounds: complexity is 21+5+18 = 24, so 256 in total, for finding 
a collision over 434 bits. 


A simple trick allows us to avoid the cost of crossing the first 4-round trail: note 
that the first keying adds (ks + to) to the sixth state word, and (kg + ti) to 
the seventh; hence, given one conforming pair, one can modify ks, kg, to, tı while 
preserving the values of (ks +to) and (kg +t), and the new input will also follow 
the differential trail. It is thus sufficient to precompute a single conforming pair 
to avoid the cost due to the prepended rounds. 

To carry out this precomputation efficiently, a considerable speedup of the 
233 complexity can be obtained by finding sufficient conditions to cross the first 
round with probability one (instead of 2~?1): 


e A first set of conditions is on the words (v2;, 2:41): whenever there is a 
nonzero difference at a same offset, the bit should have a different value 
in the first and in the second word (otherwise carries induce additional 
differences). 

e A second set of conditions concerns the differences that do not “collide”: 
one should ensure that no carry propagates from the leftmost bits. 


In total, there are 13 + 8 = 21 such conditions, which lets enough degrees of 
freedom to satisfy the subsequent differential tails. Using techniques like neutral 
bits [IQ], the probability may be reduced further, but the complexity 21? is low 
enough for efficiently finding a conforming pair. By choosing inputs according to 
the above conditions, while being careful to avoid contradictions, we can find a 
pair that conforms within a few thousand trials (see Appendix (A]for an example). 

We can now use this pair to search for near collisions. It suffices to pick random 
values for ks and kg, then set tg = —ks and tı = —ke to get a set of 2128 distinct 
inputs. Experiments were consistent with our analysis, and examples of near 
collisions are given in Appendix [B] 
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3.4 Improved Distinguisher 


Based on our trick to cross the first twelve rounds “for free”, we can improve the 
distinguisher suggested in [Z]. This distinguisher exploited the observation of a 
bias 0.01 < £ < 0.05 after 17 rounds (thus leading to a distinguisher requiring at 
least 1/0.05? ~ 400 samples). [Z] suggested to combine it with the prepending of 
four rounds, though no further details were given. Our observations show that 
with the adapted difference in the key and the tweak, a bias about 0.3 exists 
at the 385-th bit, after 21 rounds. We detected this bias using a frequency test 
similar to that in [LJ] §§2.1]. This directly gives a distinguisher on 21 rounds, 
and requiring only about 1/0.3? ~ 11 samples. 


4 Impossible Differentials 


The miss-in-the-middle technique (a term coined by Biham et al. in A), was 
first applied by Knudsen to construct a 5-round impossible differential for 
the block cipher DEAL. The idea was generalized by Biham et al. to find 
impossible differentials for ciphers of any structure. The idea is as follows: Con- 
sider a cascade cipher E = E? o E% such that for E®% there exists a differential 
(A% — A®,) and for (EÊ)! there exists a differential (AÊ, > A®,,), both 


out 
with probability one, where the equality is impossible (A®,, 4 Af). It follows 
that the differential (A7 — A?) cannot occur, for it requires AS, = Af This 


technique can be extended to the related-key setting. For example, related-key 
impossible differentials were found for 8-round AES-192 [A05]. 

Below we first present probability-1 truncated differentials on the first 13 
rounds (forward) and on the last seven rounds (backward) of 20-round Threefish- 
512. A “miss-in-the-middle” observation then allows us to deduce the existence 
of impossible differentials on 20 and 21 rounds. 


4.1 Forward Differential 


The first keying (s = 0) adds to the state vo;,...,vo,7 the values ko, k1,..., ka, 
ks + to, ke + tı, k7. Then, the second keying (s = 1) adds kı,... , ks, ke +t1,k7 + 
to, kg +1. By setting a difference A in ke, k7, tı and in the plaintext vo,7, we ensure 
that differences vanish in the first two keyings, and thus nonzero differences only 
appear after the eighth round, for third keying. 

The third keying (s = 2) adds ko,..., ke, k7 + t2, kg + to, ko + t2. Hence the 
difference A is introduced in esg,4 only. It gives a difference A in fg 4, fg.5, thus in 
v9,2, V9,5- After the tenth round, the state vio,. has the following difference with 
probability one. 


8000000000000000 0000000000000000 8000000000000000 0000000000000000 
0000000000000000 8000040000000000 0000000000000000 8000000000000000 . 
After the twelfth round (before the fourth keying), the state vı2,. has again some 


differences that occur with probability one (the X differences are uncertain, that 
is, have probability strictly below one): 
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XXXXXXXXX4000000 0000000002000000 XXXXXXXXXXXX4000 0000000000000040 
0000000000000000 XXXXXXXXXXXXX100 0000000000000000 XXXXXXXXX4000800 . 


Given this class of differences, after the 13-th round (which starts by making 
the fourth keying) we have the class of differences 


XXXXXXXXXXXXXX40 XXXXXXXXX2000000 XXXXXXXXXXXXX100 XXXXXXXXXXXXXX10 
XXXXXXXXXXXXX800 XXXXXXXXXXXXXXXX XXXXXXXXX2000000 XXXXXXXXXXXXXX40 . 


There are in total 92 bits with probability-1 differences between the 13-th and 
the 14-th round. These differences were empirically verified. 
4.2 Backward Differential 


The sixth keying (s = 5), which occurs after the 20-th round, returns the 
ciphertext 


Co = V20,0 + ks C4 = 20,4 + ko 

C1 = 20,1 + ke C5 = 020,5 + kı + te 
C2 = 20,2 + k7 Ce = V20,6 + k2 + to 
C3 = V20,3 + kg c7 = V20,7 + k3 +5 


By setting a difference A in kg, kz, tı (like for the forward differential), and in the 
ciphertext words c1, c2,c5, we ensure that differences vanish in the sixth keying, 
and thus nonzero differences only appear after the 17-th round, when making 
the fifth keying (by computing backwards from the 20-th round). 

The fifth keying (s = 4), after the 16-th round, subtracts from the state the 
values k4,...,kg,ko + t1, kı + t2, k2 + 4. Hence, the difference A is introduced 
(backwards) in v16,2, 16,3, V16,5, V16,6- After inverting the 16-th round, we obtain 
with probability one the difference 


XXXXXXXX40000000 0000000040000000 0000000000000000 0000000000000000 
8000000000000000 0000000000000000 XXXXXXXX10000000 0000000010000000 . 


Finally, after inverting the 14-th round, we have the following difference with 


probability one: 


XXXXXXXXXXXX8000 XXXXXXXXXXXX8000 XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX 
XXXXXXXXXXXXXXXX XXXXXXXXXX400000 XXXXXXXXXX800000 XX50000000800000 . 


In total there are 134 bits of difference with probability one between the 14-th 
and the 13-th round. 


4.3 Miss-in-the-Middle 


We showed that if there’s a difference A in the key in kg and k7, and in the 
tweak in tı, then a difference A in the plaintext word vp,7 propagates to give 
probability-1 differences after up to 13 rounds. Then we showed that for the 
same difference in the key and in the tweak, a difference A in the ciphertext 
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words C1, C2,Cs5 guarantees (probability one) that between the 13-th and the 14- 
th rounds we also have probability-1 differences. 

Looking for example at the first word of the state: the forward differential 
leads to a difference in the seventh bit, whereas the backward differential requires 
this bit to be unchanged. Therefore, it is impossible that a difference A in the 
plaintext vo,7 leads to a difference A in cy, c2,c5 with 20-round Threefish-512. 

We can extend this impossible differential one more round: after the 20-th 
round and the sixth keying the state has only differences A in e20,1, €20,2, €20,3- 
These differences always give the same difference after the 21-st round, because 
they are only in MSB’s. This directly gives an impossible differential on 21 
rounds of Threefish-512 (e.g., 21 out of 72). However contrary to the 20-round 
impossible differential, it is irrelevant to Threefish-512 with exactly N, = 21 
rounds, because of the final keying that occurs after the 21-st round (which 
makes some differences uncertain, because before the keying we have differences 
in non-MSB’s). 


5 Improved Key-Recovery Attacks 


The documentation of Skein sketches key-recovery attacks on all Threefish ver- 
sions, though the complexity is not studied. We analyzed these observations, and 
could find better attacks than conjectured by the Skein designers. 

To optimize the attack strategy in [M] §§9.3], the attacker has to determine 
which key bits should be guessed. This is to minimize the noise over the bias 
after a partial inversion of the last rounds, and thus to minimize the complexity 
of the attack. The less key bits guessed, the better for the attacker (up to the 
bound of half the key bits). One can easily determine which key bits do not 
affect the bias when inverting one or two rounds. For example, two rounds after 
round 21 (where the bias occurs), the 385-th bit does not affect the second, 
third, fourth, and sixth state words. Hence, it is not affected by a wrong guess 
of the key words ko, k2, kg. The bias is slightly affected by erroneous guesses of 
k3 (which modifies the last state word in the keying), but it is still large (about 
0.12 ~ 278). It is thus sufficient to guess half the key (k1, k4, ks, k7) to be able 
to observe the bias. 

Note that the cost of the prepended rounds depends on which key words 
are guessed: indeed, when guessing a word, one can adapt the corresponding 
plaintext word in order to satisfy the conditions of the differential. Here the non- 
guessed words imply a cost 21?+18 = 230 to cross the first differential. The total 
cost of recovering the 512-bit key on 23 rounds is thus about 230 x 26 x 275° = 2292, 

To attack more rounds, a more advanced search for the optimal set of bits to 
be guessed is likely to reduce the complexity of our attacks. For this, we used the 
same strategy as in the analysis of the Salsa20 and ChaCha stream ciphers [I6]. 
Namely, we computed the neutrality of each key bit (i.e., the probability that 
flipping the bit preserves the difference), and we chose to guess the bits that affect 
the bias the most, using some threshold on their neutrality. More precisely, we 
sort key bits according to their neutrality, then filter them with respect to some 


Improved Cryptanalysis of Skein 551 


threshold value. According to ’s terminology, this corresponds to partitioning 
the key bits into “significant” and “non-significant” ones. 

Recall that in §§3-4] we observed a bias at the 385-th bit after 4 + 17 rounds 
of Threefish-512. A key recovery attack on 21 + n rounds consists in guessing 
some key bits, inverting n rounds based on this guess, letting the other key bits 
be random, and observing a bias in that bit. Complexity is determined by the 
number of guessed bits and the value of the observed bias. 

Inverting four rounds with all key bits whose neutrality is greater than 0.29 
(we found 125 of those), we observe a bias 0.0365. Since some key bits are not 
guessed, and thus assumed random, some of the conditions to conform to the 
first round’s differential cannot be controlled. There are eight such additional 
conditions, which means that the 4-round initial differential will be followed 
with probability 2712-8, Since our bias approximately equals to 2748, and since 
we need to guess 512 — 125 key bits, the overall complexity of the attack on 
25-round Threefish-512 is about 212+8 x 22x48 x 9387 — 2416.6. Below we give 
the mask corresponding to the 125 non-guessed bits, for each key word: 


0000070060FFF836 0040030021FFFCOE 803CO2FO3FFFF83F 001001001603C006 
00780E30007FOO0E 0000000000000000 0000000000000000 007001800E03F801 . 


We can apply the same method on 26 rounds: with a neutrality threshold 0.17 
we obtain 30 “significant” key bits, and we observe a bias about 0.017 when 
all of them are random. The non-guessed bits give two additional conditions for 
the first 4-round differential. In total, the complexity of the attack is thus about 
212+2 x 22x5.9 x 2482 — 9507-8 Memory requirements are negligible. 


6 Boomerang Attacks 


Boomerang attacks were introduced by Wagner and first applied to block ci- 
phers [7]. Roughly speaking, in boomerang attacks one uses two short differ- 
ential trails rather than a long one to exploit the efficiency of the former trails. 
Let E denote the encryption function of Threefish. View E as a cascade of four 
subciphers 

E= E” 0 Eo EF o E°, (1) 


so that E is composed of a core E’ = E7 o EP sandwiched by rounds E® and 
E”. The boomerang distinguisher is generally described for E’ only, but for key 
recovery attacks on Threefish we need to generalize the attack to the construction 
in Eq. Q. 

Recall that in related-key attacks, one assumes that the attacker can query 
the cipher with other keys that have some specified relation with the original 
key. This relation is often an XOR-difference. A related-key differential is thus 
a triplet (Ain, Aout, Ax), associated with the probability 


Pr[Ex(m) @ Eko ar (m ® Ain) = Aout] =P 


Here, Ain and Aout are the input and output differences, A; is the key difference, 
and p the probability of the differential. 
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For (related-key) boomerang attacks based on four related-keys, one exploits 
two short related-key differentials: (Ae, Al Ae ) for EP, of probability p and 
(AP, AZ, AZ) for E7, of probability q. 


A distinguisher then works as follows: 


1. Pick a random plaintext mı and form mz = mı 6 Ab. 
2. Obtain cı = E, (mı) and c2 = Eig ae (M2): 

3. Set c3 = c1 © AT, and c4 = c2 6 Ady. 

4. Obtain m3 = Exe ay (c3) and m4 = eat (c4) 

5. 


Check m3 @ m4 = AP. 


For an ideal cipher, the final equality is expected to hold with probability 27” 
where n is the block length. The probability of the related-key boomerang distin- 
guisher, on the other hand, is approximately p?q? (see [Z7LSIT9I20) for details). 

Note that the boomerang attack can be generalized to exploit multiple differ- 
entials. The success probability then becomes f7q?, where p and ĝ are the square 
roots of the sums of the squares of the differentials exploited] 


6.1 Exploiting Nonlinear Differentials 


Differentials are often found via linearization, i.e., assuming that integer addi- 
tions behave as XOR’s. One then evaluates the probability of the differential 
with respect to the probability that each active addition behaves as XOR. This 
probability equals 2~”, where w is the Hamming weight of the logical OR of the 
two difference masks, excluding the MSB. 

Yet one is not limited to such “linear” differentials, and the best differential— 
in terms of probability—is not necessarily a linearization, as illustrated by the 
work of Lipmaa and Moriai BI]: for integer addition, they presented efficient 
algorithms for computing the probability of any differential, and for finding the 
optimal differential. The problem was later studied using formal rational series 
with linear representation [22]. 

We used the algorithms in PI] to find the differentials of our boomerang 
attacks. Note that it is not guaranteed that our trails are optimal, for the com- 
bination of local optimal differential trails (with respect to their probability) 
may contribute to a faster increase of the weight than (non-necessarily optimal) 
linear differentials. Yet our best differentials are not completely linear. 


6.2 Related-Key Distinguishers 


Like in our previous attacks, we exploit differences in the key and in the plain- 
text that vanish until the twelfth round (both for the forward and backward 
differentials). Then, we follow a nonlinear differential trail until the middle of 


1 Throughout the paper, our differentials do not make use of this multiple differential 
approach. One can further improve upon the differentials provided in this work by 
using this technique. 
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the cipher, i.e., between the 16-th and 17-th rounds. Our differential trail for E? 
has probability p = 2786, and the one for EY has probability 27113, leading to a 
boomerang distinguisher on 34 rounds requiring about (pq)~? = 2398 trials (see 
full version [9]). Note that for the second part, MSB differences are set in the 
key words kg and k3, and in the tweak words to and tı (thus giving no difference 
in the seventh subkey). 


6.3 Known-Related-Key Distinguishers 


Although the standard notion of distinguisher requires a secret (key), the notion 
of known-key distinguisher is also relevant to set apart a block cipher from 
a randomly chosen permutation. Moreover, when a block cipher is used within a 
compression function, as Threefish is, known-key distinguishers may lead to dis- 
tinguishers for the hash function because all inputs are known to the adversary. 
If differences in the keys are used, we shall thus talk of known-related-key distin- 
guisher. An example of such distinguisher is the exhibition of input/output pairs 
that have some specific relation, as presented in [23] for seven rounds of AES- 
128. Here, we shall consider tuples (m1, m2, M3, M4, C1, C2, C3, C4) that satisfy the 
boomerang property. 

To build a known-related-key boomerang distinguisher on Threefish, we con- 
sider the decryption function, i.e., we start from the end of the cipher: when the 
key is known, the attacker can easily find a ciphertext that conforms to the first 
differential (e.g., to the weight-83 differential at round 35), which we could ver- 
ify experimentally. In other words, the final differential (including the differences 
caused by the final key) is “free” when launching the boomerang. When it returns, 
however, the 283 factor cannot be avoided if we want to exactly follow the differ- 
ential (which is not strictly necessary to run a distinguisher). We thus obtain a 
distinguisher on 35-round Threefish-512 with complexity 2°° times that of the the 
related-key distinguisher on 34 rounds, that is, approximately 247° encryptions. 

Several tricks may be used to obtain a similar distinguisher at a reduced cost. 
For example, observing that the first and fourth (resp. second and third) MIX 
functions of round 34 depend only on the first and second (resp. third and fourth) 
MIX’s of round 35, one can speed-up the search for inputs conforming to the 
first two rounds of the boomerang. 


6.4 Extension to Key-Recovery 


We now show how to build a key-recovery attack on top of a boomerang distin- 
guisher for 32-round Threefish-512. We present some preliminary observations 
before describing and analyzing our attack. 

Using notations of Eq. (M: Æ? starts from the beginning and ends after the 
key addition in round 16, and Æ? starts from round 17 and ends just before the 
key addition after round 32. Our goal is to recover the last subkey. Restricted to 
32 rounds, the boomerang distinguisher has probabilities p = 2786 for EÊ and 
q= 27?" for E7, yielding an overall boomerang probability of p?q? = 27246, We 
now introduce some notions required to facilitate the analysis of our attack. 
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Definition 1 (CS-sequence). Let 6 be a 64-bit word of Hamming weight 0 < 
w < 64. The CS-sequence of 6 is 

Sa = (\sol, Is], aan |Sw—1]) ’ 
where |s;| is the bit length of the i-th block of consecutive zeros in ô finishing 
with a one. 


For example, for 6 = 1000010402000000 we have 
ð = 0001 0000 0000 0000 0000 0001 0000 0100 0000 0010 0000 --- 0000 , 
a a a ee ee a 


So Si s2 83 
and so the CS-sequence of 6 is Ss = (|So|, |51|, |s2], |s3|) = (4, 20, 6, 9). 

The following result is extensively used in the key recovery attack using 
boomerang distinguisher, whose proof is provided in the full version of this 
paper [9]. 


Theorem 1. The number of possible differences Ns after addition of difference 
ô with zero or A = 8000000000000000 difference modulo 2°* can be directly 
computed from the CS-sequence of 6 as 


w—-1 
Ns = |so| 5 II | si 


(K1 kay. shw—1)€{0,1}¥-2 i=1 
For instance, if 6 = 1000010402000000 then 
Ns =4 5 (20* x 6? x 9*3) 
(kı,k2,k3)€{0,1}3 
=4 x (1+ 9+6 + (6 x 9) +20 + (20 x 6) + (20 x 9) + (20 x 9 x 6)) 
= 4 x 1470 = 5880. 


ki 


Applying Theorem [I] we have the number of possible output differences caused 
by AZ, just after the key addition followed by the related-key boomerang dis- 
tinguisher for Threefish-512 is approximately 2°. We obtain this number by 
multiplying the number of possibilities for each word of the state (see Table Ø. 


Table 4. Number of possible output differences after the key addition in Threefish- 
512, for each word. Multiplying these numbers, we obtain in total approximately 2° 
possible differences. 


Vazi SAt Nala 
U32,0 (24, 15) 384 
U32,1 (32) 32 
U32,2 (0) 1 
32,3 (4,20,6,9) 5880 
U32,4 (1) al 
v32,5 (13,2,9,2,12,11,5) 957840 
U32,6 (13, 11, 30) 4836 


U32,7 (14) 14 
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The Attack. Our attack works in three steps: in the first step, we obtain 
quartets satisfying the related-key boomerang relation; in the second, we recover 
the partial key by using the possible right quartets obtained from the first step; 
the last step is the brute force search of the rest of the key. The attack works as 
follows. 


1. Find right quartets 
for i = 1,... , 2248 
e Generate a random unique pair of chosen plaintexts (m‘,, m$) with an 
Ae difference and encrypt each plaintext with key kt and k? (having 
Ae difference) respectively to obtain the corresponding ciphertexts 


(ci, c). 
efor j= 1240" l 
o Set c3? = ci 6 AG’, where Al, is set to the j-th possible differ- 


ence caused by Alyt- 


o Decrypt c3’ with k? and obtain the plaintext mý. 
o Store the values c3” and m3’. 


e for k=1,...,2° 
o Set c4" = c$ ALE, where Al, is set to the k-th possible differ- 
ence caused by Alue 
o Decrypt c4" with k* and obtain the plaintext m%*. 
o Calculate M = m4” ® AE and check whether M exists among 
the stored values of my . If this is the case, store the possible 
right quartet. 


e Free the memory allocated for the stored values of (possibly wrong) 
c3” and m3’. Increment i. 
2. Recover the partial key 
For each ciphertext word having a nonzero difference of a (possibly) right 
quartet (c1, C2, C3, C4) guess the corresponding output whitening key word 
ky. for | = 0,3,5,6, and check 


(C11 — hua) © (e3,2 — k21) = (c21 — k1) @ (ean — Bo) = Alata > 


where k? | = ky. © Aj, and k3 , = k4 , @ Aj. If this is the case, store 
this ky. ' ' ' , 

3. Recover the full key 
Run an exhaustive search of the remaining bits of the subkey. 


Complexity Analysis. The goal of step 1 is to find enough quartets satisfying 
the related-key boomerang trail. For each distinct 274° plaintext-ciphertext pairs 
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(m1, mz) and (c1,c2), we correspondingly generate 26? new plaintext-ciphertext 
pairs (m3,c3) and (mg4,c4) by using the possible number of output. differences 
given in Table] We know that a right quartet has to satisfy one of the possi- 
ble number of output differences A’,,,; hence it is guaranteed to find the right 
quartet once it exists as we consider all possible combinations. Note that, in- 
creasing the number of quartets in that manner does not increase the number of 
right quartets, the reason simply being the newly generated plaintext-ciphertext 
pairs (mg,c3) and (m4,c4) can only have one root right plaintext-ciphertext 
pair (m1,mz) and (c1,c2). Therefore, the expected number of right quartets is 
2248 , 9-246 — 22, On the other hand, we expect 2372 . 27512 — 27140 additional 
false quartets. 

The first loop at step 1 requires 2°? reduced round Threefish decryptions 
and approximately 270-5 bytes of memory. The second loop can be implemented 
independently and requires 26? reduced round Threefish decryptions and 2° 
memory accesses. On the other hand, we need additional memory complexity 
of 2°°-° bytes for storing A’,,, values. Therefore, the overall complexity of the 
first step is bounded by 2?! reduced round Threefish decryptions and about 2°! 
bytes of memory. Note that the memory requirement for the surviving quartets 
is negligible. 

Step 2 tries to recover the last subkey by using the quartets that passed the 
previous step. For each surviving quartet, we guess 64 bits of the final key at 
each word, decrypt one round and check the output difference A? e; . As the 
computation at each word can be processed independently, the overall complexity 
of this step is dominated by the previous step. 

The probability that a false combination of quartets and key bits is counted in 
step 2 is upper bounded by 272%: where w is the minimum hamming weight of 
the corresponding output difference AY: Therefore, the right key is suggested 
44+ 2-140 .9-2w ~ 4 times by the right and additional false quartets. On the 
other hand, a wrong key is expected to be hit 4-272” +27 140.2-2u ~ 272 times. 
Note that this only holds for the words having an XOR difference of hamming 
weight two, for the rest the number of hits is strictly less than 2~?. We can use 
Poisson distribution to calculate the success rate of our attack. For an expected 
number of 2~?, the probability that a wrong key is suggested at most once is 
0.97. However, the probability that the right key is suggested more than once is 
more than 0.90. Therefore, we can find the right key or at least eliminate most 
of the keys with high probability. The complexity of the rest of the attack is 
dominated by the first step. 


7 Conclusion 


We applied a wide range of attack strategies to the core algorithm of Skein 
(the block cipher Threefish-512), culminating with a distinguisher on 35-round 
Threefish-512, and a key-recovery attack on 32 rounds. Other versions of Three- 
fish are vulnerable to similar attack strategies (for example, our related-key 
boomerang distinguisher works on up to 33 rounds of Threefish-256). To the 
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best of our knowledge, this is the first application of a key-recovery boomerang 
attack to an “ARX” algorithm, and also the first application of the boomerang 
technique to known-key distinguishers. 

Despite its relative simplicity, the full Threefish seems to resist state-of-the- 
art cryptanalytic techniques. Its balanced “ARX” structure combined with large 
words provides a good balance between diffusion and non-linearity, and avoids 
any particular structure exploitable by attackers. Using attacks on Threefish 
to attack the hash function Skein (or its compression function) seems difficult, 
because of the rather complex mode of operation of Skein. Although none of our 
attacks directly extends to the hash mode, the pseudorandomness of Threefish 
is required to validate the security proofs on Skein. Hence, 36 or more rounds of 
Threefish seem to be required to provide optimal security. 

Future works might apply the recent rebound attack to Threefish, al- 
though it looks difficult to combine it with the trick discussed in §93.]] this 
forces the attacker to use specific differences. Another research direction relates 
to optimization of boomerang known- or chosen-key distinguishers. 
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A Conforming Pair for the 4-Round Differential 


When the key and the tweak are zero, the following two message blocks conform 
to the differential described in 582 


E979D16280002004 32B29AE900000000 D921590E00000000 5771CC9000000400 


A62FF22800000000 484B245000040080 D3BEA4E800008010 7A72784300000000 


A971917200100020 72B2DAE980002004 DD61588E01000400 5331CC1000000000 
A62FF22800040090 C84B245000000000 D1BEA4E800000000 FA72784300008010 
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B Examples of Near Collisions 


We provide an example of near collision on 459 bits for the reduced compression 
function of Skein’s UBI mode. Both inputs always have kg =--- = ky = ky = 0, 
and 


ks = CODECODECODECODE. 


On the 16-round compression function, the first input has message block 


E979D16280002004 32B29AE900000000 D921590E00000000 5771CC9000000400 
A62FF22800000000 484B245000040080 D3BEA4E800008010 7A72784300000000 


and 


ke = 6B9B2C1000000000_ to = 3F213F213F213F22 tı = 9464D3F000000000 
The second input has message block 


A971917200100020 72B2DAE980002004 DD61588E01000400 5331CC1000000000 
A62FF22800040090 C84B245000000000 D1BEA4E800000000 FA72784300008010 


and 
ke = 6B9B2C1000000000 to = BF213F213F213F22 tı = 9464D3F000000000 
The corresponding digests are respectively 


2A6DE91E3E8CDE3B BADAF451F59D3145 7C298A43FB73463F D8309C9E9E2594D5 
35431D226A2022F3 OEA42EB45F9EEEB9 DFO38EECD6504300 588A798B1266D67A 
and 


6A65A80EBE9SCFF1IF FADAB450759D1141 78618AC3FA73463F 5C709C1A9E2590D5 
B5431D226A242273 SEAE2FF45B9A6A39 5SDO38EECD650C310 DO8E788B1266576A 
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Abstract. In this paper, an improved differential cryptanalysis framework for 
finding collisions in hash functions is provided. Its principle is based on lineariza- 
tion of compression functions in order to find low weight differential characteris- 
tics as initiated by Chabaud and Joux. This is formalized and refined however in 
several ways: for the problem of finding a conforming message pair whose differ- 
ential trail follows a linear trail, a condition function is introduced so that finding 
a collision is equivalent to finding a preimage of the zero vector under the con- 
dition function. Then, the dependency table concept shows how much influence 
every input bit of the condition function has on each output bit. Careful analysis 
of the dependency table reveals degrees of freedom that can be exploited in ac- 
celerated preimage reconstruction under the condition function. These concepts 
are applied to an in-depth collision analysis of reduced-round versions of the two 
SHA-3 candidates CubeHash and MD6, and are demonstrated to give by far the 
best currently known collision attacks on these SHA-3 candidates. 


Keywords: Hash functions, collisions, differential attack, SHA-3, CubeHash and 
MD6. 


1 Introduction 


Hash functions are important cryptographic primitives that find applications in many 
areas including digital signatures and commitment schemes. A hash function is a trans- 
formation which maps a variable-length input to a fixed-size output, called message 
digest. One expects a hash function to possess several security properties, one of which 
is collision resistance. Being collision resistant, informally means that it is hard to find 
two distinct inputs which map to the same output value. In practice, the hash functions 
are mostly built from a fixed input size compression function, e.g. the renowned Merkle- 
Damgård construction. To any hash function, no matter how it has been designed, we 
can always attribute fixed input size compression functions, such that a collision for 
a derived compression function results in a direct collision for the hash function itself. 
This way, firstly we are working with fixed input size compression functions rather than 
varying input size ones, secondly we can attribute compression functions to those hash 
functions which are not explicitly based on a fixed input size compression function, and 


* An extended version is available at http://eprint.iacr.org/2009/382 
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thirdly we can derive different compression functions from a hash function. For exam- 
ple multi-block collision attack [27] benefits from the third point. Our task is to find two 
messages for an attributed compression function such that their digests are preferably 
equal (a collision) or differ in only a few bits (a near-collision). 

The goal of this work is to revisit collision-finding methods using linearization of 
the compression function in order to find differential characteristics for the compres- 
sion function. This method was initiated by Chabaud and Joux on SHA-O [Li] and was 
later extended and applied to SHA—1 by Rijmen and Oswald [26]. The recent attack on 
EnRUPT by Indesteege and Preneel is another application of the method. In par- 
ticular, in it was observed that the codewords of a linear code, which are defined 
through a linearized version of the compression function, can be used to identify differ- 
ential paths leading to a collision for the compression function itself. This method was 
later extended by Pramstaller et al. with the general conclusion that finding high 
probability differential paths is related to low weight codewords of the attributed linear 
code. In this paper we further investigate this issue. 

The first contribution of our work is to present a more concrete and tangible relation 
between the linearization and differential paths. In the case that modular addition is the 
only involved nonlinear operation, our results can be stated as follows. Given the parity 
check matrix H of a linear code, and two matrices A and B, find a codeword A such that 
AA V BA is of low weight. This is clearly different from the problem of finding a low 
weight codeword A. We then consider the problem of finding a conforming message 
pair for a given differential trail for a certain linear approximation of the compression 
function. We show that the problem of finding conforming pairs can be reformulated as 
finding preimages of zero under a function which we call the condition function. We 
then define the concept of dependency table which shows how much influence every 
input bit of the condition function has on each output bit. By carefully analyzing the 
dependency table, we are able to profit not only from neutral bits [J but also from 
probabilistic neutral bits |] in a backtracking search algorithm, similar to IBAIA. 
This contributes to a better understanding of freedom degrees uses. 

We consider compression functions working with n-bit words. In particular, we fo- 
cus on those using modular addition of n-bit words as the only nonlinear operation. The 
incorporated linear operations are XOR, shift and rotation of n-bit words in practice. 
We present our framework in detail for these constructions by approximating modular 
addition with XOR. We demonstrate its validity by applying it on reduced-round vari- 
ants of CubeHash [4] (one of the NIST SHA-3 competitors) which uses addition, 
XOR and rotation. CubeHash instances are parametrized by two parameters r and b 
and are denoted by CubeHash-r/b which process b message bytes per iteration; each 
iteration is composed of r rounds. Although we can not break the original submission 
CubeHash-8/1, we provide real collisions for the much weaker variants CubeHash- 
3/64 and CubeHash-4/48. Interestingly, we show that neither the more secure variants 
CubeHash-6/16 and CubeHash-7/64 do provide the desired collision security for 
512-bit digests by providing theoretical attacks with complexities 2??2-6 and 2203-0 re- 
spectively; nor that CubeHash-6/4 with 512-bit digests is second-preimage resistant, 
as with probability 27478 a second preimage can be produced by only one hash evalua- 
tion. Our theory can be easily generalized to arbitrary nonlinear operations. We discuss 
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this issue and as an application we provide collision attacks on 16 rounds of MD6 [23]. 
MD6 is another SHA-3 candidate whose original number of rounds varies from 80 to 
168 when the digest size ranges from 160 to 512 bits. 


2 Linear Differential Cryptanalysis 


Let’s consider a compression function H = Compress( M, V) which works with n-bit 
words and maps an m-bit message M and a v-bit initial value V into an h-bit output H. 
Our aim is to find a collision for such compression functions with a randomly given ini- 
tial value V. In this section we consider modular-addition-based Compress functions, 
that is, they use only modular additions in addition to linear transformations. This in- 
cludes the family of AXR (Addition-XOR-Rotation) hash functions which are based 
on these three operations. In Section] we generalize our framework to other family of 
compression functions. For these Compress functions, we are looking for two messages 
with a difference A that result in a collision. In particular we are interested in a A for 
which two randomly chosen messages with this difference lead to a collision with a high 
probability for a randomly chosen initial value. For modular-addition-based Compress 
functions, we consider a linearized version for which all additions are replaced by XOR. 
This is a common linear approximation of addition. Other possible linear approxima- 
tions of modular addition, which are less addressed in literature, can be considered ac- 
cording to our generalization of SectionB] As addition was the only nonlinear operation, 
we now have a linear function which we call Compress. Since Compress, (M, V) © 
Compress; (M @ A,V) = Compress,;,,(A,0) is independent of the value of V, we 
adopt the notation Compress; (M) = Compress,;,,(//,0) instead. Let A be an el- 
ement of the kernel of the linearized compression function, i.e. Compress;,,,(A) = 0. 
We are interested in the probability Pr{ Compress( M, V)® Compress( M $ A, V) = 0} 
fora random M and V. In the following we present an algorithm which computes this 
probability, called the raw (or bulk) probability. 


2.1 Computing the Raw Probability 


We consider a general n-bit vector x = (£o, ...,&n—1) as an n-bit integer denoted by 
the same variable, i.e. £ = par 2x;2'. The Hamming weight of a binary vector or an 


integer x, wt(x), is the number of its nonzero elements, i.e. wt(x) = =. £i. We 
use + for modular addition of words and @, V and A for bit-wise XOR, OR and AND 
logical operations between words as well as vectors. We use the following lemma which 
is a special case of the problem of computing Pr{ ((A@a)+(B@)) 6(A+B) = 7} 
where a, 8 and y are constants and A and B are independent and uniform random 
variables, all of them being n-bit words. Lipmaa and Moriai have presented an efficient 
algorithm for computing this probability [9]. We are interested in the case y = a 9 8 
for which the desired probability has a simple closed form. 


Lemma 1. Pr{((A@a)+(B@8)) 6(A+B)=a@p}= gwt (lavaa ==), 


Lemma [I] gives us the probability that modular addition behaves like the XOR op- 
eration. As Compress);,, approximates Compress by replacing modular addition with 
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XOR, we can then devise a simple algorithm to compute (estimate) the raw probability 
Pr{Compress(/, V)@Compress(M@A, V) = Compress;;,, (A) }. Let’s first introduce 
some notation. 


Notation. Let naaa denote the number of additions which Compress uses in total. In 
the course of evaluation of Compress( M, V), let the two addends of the i-th addition 
(1 <i < naqa) be denoted by A*(M,V) and B'(M, V), for which the ordering is not 
important. The value C’(M, V) = (A‘(M,V)+ B*(M,V)) @ A'(M, V) @ B’(M,V) 
is then called the carry word of the 7-th addition. Similarly, in the course of evaluation 
of Compress;;,,(A), denote the two inputs of the i-th linearized addition by a’(A) and 
3°(A) in which the ordering is the same as that for A’ and Bt. We define five more 
functions A(M, V), B(M, V), C(M, V), @(A) and B(A) with (n — 1)naaa-bit out- 
puts. These functions are defined as the concatenation of all the naaa relevant words 
excluding their MSBs. For example A (M, V) and a(A) are respectively the concate- 
nation of the naga words (A'(M,V),..., A”»4 (M, V)) and (a1(A),...,a%*44(A)) 
excluding the MSBs. 
Using this notation, the raw probability can be simply estimated as follows. 


Lemma 2. Let Compress be a modular-addition-based compression function. Then for 


any message difference A and for random values M and V, pa = g-wt(a(a)va(a)) is 
a lower bound for Pr{ Compress( M, V) @ Compress(M @ A, V) = Compress;;,,(A) }. 


Proof. We start with the following definition. 


Definition 1. We say that a message M (for a given V) conforms to (or follows) the 
trail of A 


((A’ D 0) + (B' @ 6) @(A' + B’) = a @ pf, forl<i<meaa, (D) 


where A’, B', a’ and 3° are shortened forms for A’'(M, V), B*(M,V), at (A) and 
B' (A), respectively. 


It is not difficult to prove that under some reasonable independence assumptions pa, 
which we call conforming probability, is the probability that a random message M 
follows the trail of A. This is a direct corollary of Lemma[]Jand Definition[]]. The exact 
proof can be done by induction on naga, the number of additions in the compression 
function. Due to other possible non-conforming pairs that start from message difference 
A and lead to output difference Compress);,,(A), pa is a lower bound for the desired 
probability in the lemma. 


If Compress;,,,(A) is of low Hamming weight, we get a near collision in the output. The 
interesting A’s for collision search are those which belong to the kernel of Compress;;,,, 
i.e. those that satisfy Compress,;,,(4) = 0. From now on, we assume that A # 0 
is in the kernel of Compress,;,,, hence looking for collisions. According to Lemma} 
one needs to try around 1/p, random message pairs in order to find a collision which 
conforms to the trail of A. However in a random search it is better not to restrict oneself 


' Tf and only if. 
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to the conforming messages as a collision at the end is all we want. Since p4 is a lower 
bound for the probability of getting a collision for a message pair with difference A, 
we might get a collision sooner. In SectionB]we explain a method which might find a 
conforming message by avoiding random search. 


2.2 Link with Coding Theory 


We would like to conclude this section with a note on the relation between the fol- 
lowing two problems: (I) finding low-weight codewords of a linear code, (II) finding a 
high probability linear differential path. Since the functions Compress;;,,(A), a(A) 
and B(A) are linear, we consider A as a column vector and attribute three matri- 
ces H, A and B to these three transformations, respectively. In other words we have 
Compress; (^) = HA, a(A) = AA and B(A) = BA. We then call H the parity 
check matrix of the compression function. 

Based on an initial work by Chabaud and Joux {LI}, the link between these two 
problems has been discussed by Rijmen and Oswald in and by Pramstaller et al. 
in with the general conclusion that finding highly probable differential paths is re- 
lated to low weight codewords of the attributed linear code. In fact the relation between 
these two problems is more delicate. For problem (I), we are provided with the parity 
check matrix H of a linear code for which a codeword A satisfies the relation HA = 0. 
Then, we are supposed to find a low-weight nonzero codeword A. This problem is be- 
lieved to be hard and there are some heuristic approaches for it, see for example. 
For problem (II), however, we are given three matrices H, A and B and need to find a 
nonzero A such that HA = 0 and AA V BA is of low-weight, see Lemma [f] Never- 
theless, low-weight codewords A’s matrix H might be good candidates for providing 
low-weight AAV BA, i.e. differential paths with high probability p4. In particular, this 
approach is promising if these three matrices are sparse. 


3 Finding a Conforming Message Pair Efficiently 


The methods that are used to accelerate the finding of a message which satisfies some 
requirements are referred to as freedom degrees use in the literature. This includes 
message modifications (27), neutral bits [7], boomerang attacks Oad, tunnels 
and submarine modifications [21]. In this section we show that the problem of finding 
conforming message pairs can be reformulated as finding preimages of zero under a 
function which we call the condition function. One can carefully analyze the condition 
function to see how freedom degrees might be used in efficient preimage reconstruc- 
tion. Our method is based on measuring the amount of influence which every input bit 
has on each output bit of the condition function. We introduce the dependency tables to 
distinguish the influential bits, from those which have no influence or are less influen- 
tial. In other words, in case the condition function does not mix its input bits well, we 
profit not only from neutral bits [7] but also from probabilistic neutral bits [2]. This is 
achieved by devising a backtracking search algorithm, similar to IBAIA, based on 
the dependency table. 
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3.1 Condition Function 


Let’s assume that we have a differential path for the message difference A which holds 
with probability pa = 2~¥. According to Lemma[Jwe have y = wt(a(A) v B(A)). 
In this section we show that, given an initial value V, the problem of finding a con- 
forming message pair such that Compress( M, V) © Compress(M @ A, V) = 0 can 
be translated into finding a message M such that Conditiona (M, V) = 0. Here Y = 
Conditiona (M, V) is a function which maps m-bit message M and v-bit initial value 
V into y-bit output Y. In other words, the problem is reduced to finding a preimage of 
zero under the Condition, function. As we will see it is quite probable that not every 
output bit of the Condition function depends on all the message input bits. By taking a 
good strategy, this property enables us to find the preimages under this function more 
efficiently than random search. But of course, we are only interested in preimages of 
zero. In order to explain how we derive the function Condition from Compress we first 
present a quite easy-to-prove lemma. We recall that the carry word of two words A and 
B is defined as C = (A + B) 9 AQ B. 


Lemma 3. Let A and B be two n-bit words and C represent their carry word. Let 
ô = 2" for0 < i < n — 2. Then, 


((A@6)+(BS5d)) =(A+B)S4,6B,61=0, (2) 
(A+(B@6))=(A+B) 6664, 6C,=0, (3) 

and similarly 
((A@®6)+B)=(A B)@6SBeG=0. (4) 


For a given difference A, a message M and an initial value V, let Ax, By, Ck, œk and 
Br 0 < k < (n—1)naaa, respectively denote the k-th bit of the output vectors of the 
functions A(M, V), B(M, V), C(M,V), a(A) and (A), as defined in Section ZI] 
Let {i0,...,¢y—1},0 < to < i1 < +++ < ty-1 < (n— 1)naaa be the positions of 1’s in 
the vector œ V B. We define the function Y = Conditiona (M, V) as: 


Ai; ® Bi, ® Lif (ai; Bi) = (1,1), 


Y; = Ai, D Ci, if (ai; Bi) = (0, 1), (5) 
Bi, Rz Ci, if (aij, Bi) _ (1,0), 
for j = 0,1,...,y — 1. This equation can be equivalently written as equation (J). 


Proposition 1. For a given V and A, a message M conforms to the trail of A iff 
Conditiona (M, V) = 0. 


3.2 Dependency Table for Freedom Degrees Use 


For simplicity and generality, let’s adopt the notation F(M, V) = Conditiona (M, V) 
in this section. Assume that we are given a general function Y = F (M, V) which maps 
m message bits and v initial value bits into y output bits. Our goal is to reconstruct 
preimages of a particular output, for example the zero vector, efficiently. More precisely, 
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we want to find V and M such that F'(M,V) = 0. If F mixes its input bits very well, 
one needs to try about 2” random inputs in order to find one mapping to zero. However, 
in some special cases, not every input bit of F affects every output bit. Consider an ideal 
situation where message bits and output bits can be divided into £ and ¢ + 1 disjoint 
subsets respectively as Ui Mi; and [eas V; such that the output bits VY; (0 < j < 4 
only depend on the input bits P Mi; and the initial value V. In other words, once 
we know the initial value V, we can determine the output part Vo. If we know the 
initial value V and the input portion Mı, the output part VY; is then known and so 
on. Refer to Section J to see the partitioning of a condition function related to MD6. 
This property of F suggests Algorithm[]] for finding a preimage of zero. Algorithm[] 
is a backtracking search algorithm in essence, similar to [@[24)[4], and in practice 
is implemented recursively with a tree-based search to avoid memory requirements. 
The values go, q1, . - - , qe are the parameters of the algorithm to be determined later. To 
discuss the complexity of the algorithm, let |M;| and |V;| denote the cardinality of M; 
and yV; respectively, where |Vo| > 0 and |V;| > 1 for 1 < i < £. We consider an ideal 
behavior of F for which each output part depends in a complex way on all the variables 
that it depends on. Thus, the output segment changes independently and uniformly at 
random if we change any part of the relevant input bits. 


Algorithm 1. Preimage finding 

Require: go, q1,---; Ge 

Ensure: some preimage of zero under F 
0: Choose 2% initial values at random and keep those 291 candidates which make Yo part null. 
1: For each candidate, choose 2% —1 values for My, and keep those 292 ones making Vı null. 
2: For each candidate, choose 292 =42 values for Meo and keep those 293 ones making V2 null. 


i: For each candidate, choose 2% —% values for Mi and keep those 2%+1 ones making JY; null. 


£: For each candidate, choose 91-4 values for M ¢ and keep those 9%+1 final candidates 
making Ve null. 


To analyze the algorithm, we need to compute the optimal values for qo, .. . , qe. The 
time complexity of the algorithm is a 2% as at each step 2% values are examined. 
The algorithm is successful if we have at least one candidate left at the end, i.e. q 412 
0. We have q; 41 qi- |Y;|, coming from the fact that at the i-th step 2% values are 
examined each of which makes the portion V; of the output null with probability 271»! 
Note that we have the restrictions q; — q; < |M,| and 0 < q; since we have |M,| bits 
of freedom degree at the i-th step and we require at least one surviving candidate after 
each step. Hence, the optimal values for g;’s can be recursively computed as qj-1 = 
|\V;-1| + max(0, qi — |M,|) for i = £, — 1,...,1 with qe = |r]. 

How can we determine the partitions M; and V; for a given function F? We pro- 
pose the following heuristic method for determining the message and output partitions 
in practice. We first construct a y x m binary valued table T called dependency table. 
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The entry T; j, 0 < i < m — 1 and0 < j < y —1, is set to one iff the j-th output bit 
is highly affected by the 7-th message bit. To this end we empirically measure the prob- 
ability that changing the 7-th message bit changes the j-th output bit. The probability 
is computed over random initial values and messages. We then set T; ; to one iff this 
probability is greater than a threshold 0 < th < 0.5, for example th = 0.3. We then 
call AlgorithmP] 


Algorithm 2. Message and output partitioning 
Require: Dependency table T 
Ensure: £, message partitions M,,..., Mge and output partitions Yo,..., Ve. 
1: Put all the output bits j in Yo for which the row j of T is all-zero. 
2: Delete all the all-zero rows from T. 
3: £:=0; 
4: while T is not empty do 
5 £:= £41; 
6: repeat 
7 Determine the column 7 in T which has the highest number of 1’s and delete it from T. 
8 Put the message bit which corresponds to the deleted column i into the set Me. 
9: until There is at least one all-zero row in T OR T becomes empty 
10: IfT is empty set Vx to those output bits which are not in = YV; and stop. 
11: Put all the output bits j in Ye for which the corresponding row of T is all-zero. 
12: Delete all the all-zero rows from T. 
13: end while 


In practice, once we make a partitioning for a given function using the above method, 
there are two issues which may cause the ideal behavior assumption to be violated: 


1. The message segments M1, ..., M; do not have full influence on V;, 
2. The message segments Mj41,..., Mz have influence on Yo,... , Vi. 


With regard to the first issue, we ideally would like that all the message segments 
My, M2,..., M; as well as the initial value V have full influence on the output part 
Vi. In practice the effect of the last few message segments M;_a,,...,M; (for some 
small integer d;) is more important, though. Theoretical analysis of deviation from this 
requirement may not be easy. However, with some tweaks on the tree-based (back- 
tracking) search algorithm, we may overcome this effect in practice. For example if the 
message segment M;—ı does not have a great influence on the output segment V;, we 
may decide to backtrack two steps at depth 2, instead of one (the default value). The 
reason is as follows. Imagine that you are at depth ¿ of the tree and you are trying to 
adjust the i-th message segment M;, to make the output segment V; null. If after trying 
about 2™"(/M:l-1¥1) choices for the i-th message block, you do not find an appropriate 
one, you will go one step backward and choose another choice for the (i— 1)-st message 
segment M;—1; you will then go one step forward once you have successfully adjusted 
the (i — 1)-st message segment. If M;—1 has no effect on V;, this would be useless and 
increase our search cost at this node. Hence it would be appropriate if we backtrack 
two steps at this depth. In general, we may tweak our tree-based search by setting the 
number of steps which we want to backtrack at each depth. 
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In contrast, the theoretical analysis of the second issue is easy. Ideally, we would 
like that the message segments M;,...,M@v have no influence on the output seg- 
ments Yo,...,j;-1. The smaller the threshold value th is chosen, the less the in- 
fluence would be. Let 2~?', 1 < i < £, denote the probability that changing the 
message segment M; does not change any bit from the output segments Yo, ..., Vi-1. 
The probability is computed over random initial values and messages, and a random 
non-zero difference in the message segment M;. Algorithm [must be reanalyzed in 
order to recompute the optimal values for qo, ...,qe. Algorithm [I] also needs to be 
slightly changed by reassuring that at step 2, all the output segments Yo,..., Vi-1 re- 
main null. The time complexity of the algorithm is still pan 2% and it is successful 
if at least one surviving candidate is left at the end, i.e. qe+1 > 0. However, here we 
set q; 4 qi- |V;| — pi. This comes from the fact that at the i-th step 2% values are 
examined each of which makes the portion V; of the output null with probability 271%»l 
and keeping the previously set output segments Yo, . . . , Yj—1 null with probability 27” 
(we assume these two events are independent). Here, our restrictions are again 0 < q; 
and q; — q; < |M,|. Hence, the optimal values for q;’s can be recursively computed as 
Gi-1 = Pi_1 + [Wi_1| + max(0, qi — |M;l|) for i = £,€—1,...,1 with qe = |Y]. 


Remark 1. When working with functions with a huge number of input bits, it might be 
appropriate to consider the m-bit message M as a string of u-bit units instead of bits. 
For example one can take u = 8 and work with bytes. We then use the notation M = 
(M[0],..., M[m/u—1]) (assuming u divides m) where M[i] = (Miu,..-, Miu+u—1)- 
In this case the dependency table must be constructed according to the probability that 
changing every message unit changes each output bit. 


4 Application to CubeHash 


CubeHash [4] is Bernstein’s proposal for the NIST SHA-3 competition [22]. CubeHash 
variants, denoted by CubeHash-r/b, are parametrized by r and b which at each iter- 
ation process b bytes in r rounds. Although CubeHash-8/1 was the original official 
submission, later the designer proposed the tweak CubeHash-16/32 which is almost 
16 times faster than the initial proposal [B]. Nevertheless, the author has encouraged 
cryptanalysis of CubeHash-r/b variants for smaller r’s and bigger b’s. 


4.1 CubeHash Description 


CubeHash works with 32-bit words (n = 32) and uses three simple operations: XOR, 
rotation and modular addition. It has an internal state S = (So, 51,...,531) of 32 
words and its variants, denoted by CubeHash-r/b, are identified by two parameters 
r € {1,2,...} andb € {1,2,...,128}. The internal state S is set to a specified value 
which depends on the digest length (limited to 512 bits) and parameters r and b. The 
message to be hashed is appropriately padded and divided into b-byte message blocks. 
At each iteration one message block is processed as follows. The 32-word internal state 
S is considered as a 128-byte value and the message block is XORed into the first b 
bytes of the internal state. Then, the following fixed permutation is applied r times to 
the internal state to prepare it for the next iteration. 
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1. Add Si into Sigie, for 0 < a < 15. 

2. Rotate S; to the left by seven bits, for O < ¿i < 15. 

3. Swap S; and Sigs, forO < i < 7. 

4. XOR Sigi into Si, for 0 < 1 < 15. 

5. Swap S; and Sigo, for i € {16, 17, 20,21, 24, 25,28, 29}. 
6. Add Si into Siaie, for 0 < 1 < 15. 

7. Rotate S; to the left by eleven bits, for 0 < i < 15. 

8. Swap S; and Sia, for i € {0,1,2,3,8,9, 10,11}. 

9. XOR Sigi into Si, for 0 < 1 < 15. 
10. Swap S; and Sig1, fori € {16, 18, 20, 22, 24, 26, 28, 30}. 


Having processed all message blocks, a fixed transformation is applied to the final in- 
ternal state to extract the hash value as follows. First, the last state word S'3; is ORed 
with integer | and then the above permutation is applied 10 x r times to the resulting 
internal state. Finally, the internal state is truncated to produce the message digest of 
desired hash length. Refer to [A] for the full specification. 


4.2 Definition of the Compression Function Compress 


To be in the line of our general method, we need to deal with fixed-size input com- 
pression functions. To this end, we consider t (t > 1) consecutive iterations of Cube- 
Hash. We define the function H = Compress(/,V) with an 8bt-bit message M = 
M°||...|| Mt}, a 1024-bit initial value V and a (1024 — 8b)-bit output H. The initial 
value V is used to initialize the 32-word internal state of CubeHash. Each M’ is a b-byte 
message block. We start from the initialized internal state and update it in t iterations. 
That is, in t iterations the t message blocks M°,..., Mt} are sequentially processed 
in order to transform the internal state into a final value. The output H is then the last 
128 — b bytes of the final internal state value which is ready to absorb the (t + 1)-st 
message block (the 32-word internal state is interpreted as a 128-byte vector). 

Our goal is to find collisions for this Compress function. In the next section we 
explain how collisions can be constructed for CubeHash itself. 


4.3 Collision Construction 


We are planning to construct collision pairs (/’, M”) for CubeHash-r/b which are of 
the form M’ = MP¥°||M||M*||Ms! and M” = M™:||M S Al|M*@ At|| MË. Here, 
MP’ is the common prefix of the colliding pairs whose length in bytes is a multiple of 
b, M* is one message block of b bytes and Ms“ is the common suffix of the colliding 
pairs whose length is arbitrary. The message prefix MP™® is chosen for randomizing the 
initial value V. More precisely, V is the content of the internal state after processing 
the message prefix MP™®. For this value of V, (M, M @ A) is a collision pair for the 
compression function, i.e. Compress( M, V) = Compress( M $ A, V). Remember that 
a collision for the Compress indicates collision over the last 128 — b bytes of the internal 
state. The message blocks M* and M* @ A’ are used to get rid of the difference in the 
first b bytes of the internal state. The difference A’ is called the erasing block difference 
and is computed as follows. When we evaluate the Compress with inputs (M, V) and 
(M @ A,V), A* is the difference in the first b bytes of the final internal state values. 
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Once we find message prefix MP"°, message M and difference A, any message pairs 
(M', M”) of the above-mentioned form is a collision for CubeHash for any message 
block M* and any message suffix MM‘! , We find the difference A using the linearization 
method of Section J] to applied to CubeHash in the next section. Then, MP?"° and M 
are found by finding a preimage of zero under the Condition function as explained in 
SectionB] Algorithm 4 in the extended version of this article [J shows how CubeHash 
Condition function can be implemented in practice for a given differential path. 


4.4 Linear Differentials for CubeHash-r/b 


As we explained in Section[] the linear transformation Compress;;,, can be identified by 
a matrix Hy. We are interested in A’s such that HA = 0 and such that the differential 
trails have high probability. For CubeHash-r/b with t iterations, A = A°||... || A‘? 
and H has size (1024 — 8b) x 8bt, see Section £2] This matrix suffers from having low 
rank. This enables us to find low weight vectors of the kernel. We then hope that they 
are also good candidates for providing highly probable trails, see Section2.2] Assume 
that this matrix has rank (8bt — r), rT > 0, signifying existence of 27 — 1 nonzero 
solutions to HA = 0. To find a low weight nonzero A, we use the following method. 

The rank of H being (8bt — T) shows that the solutions can be expressed by iden- 
tifying 7 variables as free and expressing the rest in terms of them. Any choice for the 
free variables uniquely determines the remaining 8bt — 7 variables, hence providing a 
unique member of the kernel. We choose a set of 7 free variables at random. Then, we 
set one, two, or three of the 7 free variables to bit value 1, and the other 7 — 1, or T— 2 or 
T — 3 variables to bit value 0 with the hope to get a A providing a high probability dif- 
ferential path. We have made exhaustive search over all 7 + (3) + (3) possible choices 
for all b € {1,2,3,4,8,16,32,48,64} and r € {1,2,3,4,5,6,7,8} in order to find 
the best characteristics. Table [includes the ordered pair (t, y), i.e. the corresponding 
number of iterations and the — log, probability (number of bit conditions) of the best 
raw probability path we found. For most of the cases, the best characteristic belongs to 
the minimum value of t for which 7 > 0. There are a few exceptions to consider which 
are starred in Table[]] For example in the CubeHash-3/4 case, while for t = 2 we have 
T = 4and y = 675, by increasing the number of iterations to t = 4, we get rT = 40 
and a better characteristic with y = 478. This may hold for other cases as well since we 
only increased ¢ until our program terminated in a reasonable time. We would like to 
emphasize that since we are using linear differentials, the erasing block difference A‘ 
only depends on the difference A, see Section#3) 


Table 1. The values of (t, y) for the differential path with the best found raw probability 


“\Y 1 2 3 4 8 12 16 32 48 64 


Pap ce aaray CEINE ERUEN 850) KENED] (2,637) 
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Second preimage attacks on CubeHash. Any differential path with raw probabil- 
ity greater than 2751? can be considered as a (theoretical) second preimage attack on 
CubeHash with 512-bit digest size. In Table [the entries which do not correspond to a 
successful second preimage attack, i.e. y > 512, are shown in gray, whereas the others 
have been ees For example, our differential path for CubeHash-6/4 with raw 
probability 27478 indicates that by only one hash evaluation we can produce a second 
preimage with probability 27478. Alternatively, it can be stated that for a fraction of 
2-478 messages we can easily provide a second preimage. The list of differential trails 
for highlighted entries can be found in the extended version [D]. 


4.5 Collision Attacks on CubeHash Variants 


Although Table[Jincludes our best found differential paths with respect to raw proba- 
bility or equivalently second preimage attack, when it comes to freedom degrees use for 
collision attack, these trails might not be the optimal ones. In other words, for a specific 
r and b, there might be another differential path which is worse in terms of raw prob- 
ability but is better regarding the collision attack complexity if we use some freedom 
degrees speedup. As an example, for CubeHash-3/48 with the path which has raw 
probability 27364, using our method of Section] the time complexity can be reduced 
to about 2°°-9 (partial) evaluation of its condition function. However, there is another 
path with raw probability 27368 which has time complexity of about 253-3 (partial) eval- 
uation of its condition function. Table] shows the best paths we found regarding the 
reduced complexity of the collision attack using our method of Section B] While most 
of the paths are still the optimal ones with respect to the raw probability, the starred en- 
tries indicate the ones which invalidate this property. Some of the interesting differential 
paths for starred entries in TableBJare given in the extended version [9]. 

TableB]shows the reduced time complexities of collision attack using our method of 
Section B] for the differential paths of Table J] To construct the dependency table, we 
have analyzed the Condition function at byte level, see Remark[]] The time complexities 
are in logarithm 2 basis and might be improved if the dependency table is analyzed at a 
bit level instead. The complexity unit is (partial) evaluation of their respective Condition 
function. We remind that the full evaluation of a Condition function corresponding to a 
t-iteration differential path is almost the same as application of t iterations (rt rounds) 
of CubeHash. We emphasize that the complexities are independent of digest size. All 
the complexities which are less than 2°/? can be considered as a successful collision 
attack if the hash size is bigger than c bits. The complexities bigger than 27°° have been 
shown in gray as they are worse than birthday attack, considering 512-bit digest size. 
The successfully attacked instances have been 

The astute reader should realize that the complexities of Table B] correspond to the 
optimal threshold value, see Section B2] Refer to the extended version to see the 
effect of the threshold value on the complexity. 


Practice versus theory. We provided a framework which is handy in order to analyze 
many hash functions in a generic way. In practice, the optimal threshold value may 
be a little different from the theoretical one. Moreover, by slightly playing with the 
neighboring bits in the suggested partitioning corresponding to a given threshold value 
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Table 2. The values of (t, y) for the differential path with the best found total complexity (TableB] 
includes the reduced complexities using our method of Section B) 


[as 25) SA] (4A) 2) hs] - | - | - | - | — 
[ (7, 1225) | (4,221) | (2,46) | EDI EE - [| - [| - [| - | -_| 
(5, 2614) | (3, 964) | (3, 195) (2,189) 


5 
(2614) 


(Algorithm 2), we may achieve a partitioning which is more suitable for applying the 
attacks. In particular, Table B] contains the theoretical complexities for different Cube- 
Hash instances under the assumption that the Condition function behaves ideally with 
respect to the first issue discussed in Section BJ] In practice, deviation from this as- 
sumption increases the effective complexity. For particular instances, more simulations 
need to be done to analyze the potential non-randomness effects in order to give a more 
exact estimation of the practical complexity. 

According to Section ÆJ] for a given linear difference A, we need to find message 
prefix MP and conforming message M for collision construction. Our backtracking 
(tree-based) search implementation of Algorithm[]}for CubeHash-3/64 finds MP"° and 
M in 2?! (median complexity) instead of the 29-4 of TableB] The median decreases to 2!” 
by backtracking three steps at each depth instead of one, see SectionB.y For CubeHash- 
4/48 we achieve the median complexity 2°°'* which is very close to the theoretical value 
230-7 of Table B] Collision examples for CubeHash-3/64 and CubeHash-4/48 can be 
found in the extended paper [9]. Our detailed analysis of CubeHash variants shows that 
the practical complexities for all of them except 3-round CubeHash are very close to 
the theoretical values of Table B] We expect the practical complexities for CubeHash 
instances with three rounds to be slightly bigger than the given theoretical numbers. For 
detailed comments we refer to the extended paper [9]. 


Comparison with the previous results. The first analysis of CubeHash was proposed 
by Aumasson et al. in which the authors showed some non-random properties for 
several versions of CubeHash. A series of collision attacks on CubeHash-1/b and 
CubeHash-2/b for large values of b were announced by Aumasson [I] and Dai (J. 


Table 3. Theoretical log, complexities of improved collision attacks with freedom degrees use at 
byte level for the differential paths of Table P] 
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Collision attacks were later investigated deeply by Brier and Peyrin [B]. Our results 
improve on all existing ones as well as attacking some untouched variants. 


5 Generalization 


In sectionsPJandB]we considered modular-addition-based compression functions which 
use only modular additions and linear transformations. Moreover, we concentrated on 
XOR approximation of modular additions in order to linearize the compression func- 
tion. This method is however quite general and can be applied to a broad class of hash 
constructions, covering many of the existing hash functions. Additionally, it lets us 
consider other linear approximations as well. We view a compression function H = 
Compress( M, V) : {0,1}™x{0,1}” — {0,1}” as a binary finite state machine (FSM). 
The FSM has an internal state which is consecutively updated using message M and 
initial value V. We assume that FSM operates as follows, and we refer to such Compress 
functions as binary-FSM-based. The concept can also cover non-binary fields. 

The internal state is initially set to zero. Afterwards, the internal state is sequentially 
updated in a limited number of steps. The output value H is then derived by truncating 
the final value of the internal state to the specified output size. At each step, the internal 
state is updated according to one of these two possibilities: either the whole internal state 
is updated as an affine transformation of the current internal state, M and V, or only one 
bit of the internal state is updated as a nonlinear Boolean function of the current internal 
state, M and V. Without loss of generality, we assume that all of the nonlinear updat- 
ing Boolean functions (NUBF) have zero constant term (i.e. the output of zero vector is 
zero) and none of the involved variables appear as a pure linear term (i.e. changing any 
input variable does not change the output bit with certainty). This assumption, coming 
from the simple observation that we can integrate constants and linear terms in an affine 
updating transformation (AUT), is essential for our analysis. Linear approximations of 
the FSM can be achieved by replacing AUTs with linear transformations by ignor- 
ing the constant terms and NUBFs with linear functions of their arguments. Similar to 
Section] this gives us a linearized version of the compression function which we de- 
note by Compress; (M, V). As we are dealing with differential cryptanalysis, we take 
the notation Compress, (M) = Compress (M, 0). The argument given in Section] 
is still valid: elements of the kernel of the linearized compression function (i.e. A’s s.t. 
Compress» (^) = 0) can be used to construct differential trails. 

Let ny; denote the total number of NUBFs in the FSM. We count the NUBFs by 
starting from zero. We introduce four functions A(M, V), (A), A4(M, V) and r(A) 
all of output size n, bits. To define these functions, consider the two procedures which 
implement the FSMs of Compress( M, V) and Compress;,;,,(A). Let the Boolean func- 
tion g5; 0 < k < my, stand for the k-th NUBF and denote its linear approximation as 
in Compress;;,, by g% „. Moreover, denote the input arguments of the Boolean functions 
g* and gk, in the FSMs which compute Compress(M, V) and Compress;;,,() by the 
vectors x” and 6”, respectively. Note that 5* is a function of A whereas z” depends 
on M and V. The k-th bit of (A), [4,(A), is set to one iff the argument of the k-th 
linearized NUBF is not the all-zero vector, i.e. Ij,(A) = 1 iff 6* 4 0. We then define 
Ax(M,V) = g®(x*), B(A) = gk, (6*) and AA(M,V) = g*(x* © 6*). We can then 
present the following proposition. The proof is given in the full version paper [D]. 
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Proposition 2. Let Compress be a binary-FSM-based compression function. For any 
message difference A, let {ip,...,iy-1},0 < io < i1 < +++ < ty-1 < mm be the 
positions of 1’s in the vector (A) where y = wt(I'(A)). We define the condition 
function Y = Conditiona (M, V) where the j-th bit of Y is computed as 


Y; = A (M, V) © AZ(M, V) @ $, (4). (6) 


Then, if A is in the kernel of Compress, Conditiona (M, V) = 0 implies that the pair 
(M, M $ A) is a collision for Compress with the initial value V. 


Remark 2. The modular-addition-based compression functions can be implemented as 
binary-FSM-based compression by considering one bit memory for the carry bit. All the 
NUBFs for this FSM are of the form g(x, y, z) = cy®@uz@yz. The XOR approximation 
of modular addition in Section] corresponds to approximating all the NUBFs g by the 
zero function, i.e. gin(%,y,z) = 0. It is straightforward to show that A;,(M,V) = 
g(Ak, Bk, Ck) and P(A) = giin(@x, B,,0). We then deduce that [,(A) = az V 
By V 0 and AA(M,V) = g(Ax © ak, BE © Bp, Cr © 0). As a result we get 


Yj = Ai,(M,V) © AA(M,V) © 8, (A) 
= (au, © Bi, )Ci; p a;, Bi; ® 6i Ai; ® a, Gi, 


whenever a;, V Bi, = 1; this agrees with equation 6). Refer to the extended version 
for more details and to see how other linear approximations could be used. 


(7) 


6 Application to MD6 


MD6 [P3], designed by Rivest et al., is a SHA-3 candidate that provides security proofs 
regarding some differential attacks. The core part of MD6 is the function f which 
works with 64-bit words and maps 89 input words (Ao, . . . , Agg) into 16 output words 
(Ai6r+73,---;A16r+ss) for some integer r representing the number of rounds. Each 
round is composed of 16 steps. The function f is computed based on the following 
recursion 


Aisso = Lri,li (Si ® A; ® (Ait71 A Aites) © (Ai+ss A Ait22) D Ai+72); (8) 


where S;’s are some publicly known constants and Ly, ;,’s are some known simple linear 
transformations. The 89-word input of f is of the form Q||U||W||A||B where Q is a 
known 15-word constant value, U is a one-word node ID, W is a one-word control word, 
K is an 8-word key and B is a 64-word data block. For more details about function f and 
the mode of operation of MD6, we refer to the submission document . We consider 
the compression function H = Compress(M,V) = f(Q||U||W||K||B) where V = 
U||W||K, M = Band H is the 16-word compressed value. Our goal is to find a collision 
Compress( M, V) = Compress( M’, V) for arbitrary value of V. We later explain how 
such collisions can be translated into collisions for the MD6 hash function. 

According to our model (Section Bh, MD6 can be implemented as an FSM which 
has 64 x 167 NUBFs of the form g(z,y,2z,w) = x- y ® z- w. Remember that 


> In the MD6 document [23], C and L,,,1; are respectively denoted by V and g,., ,. 
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the NUBFs must not include any linear part or constant term. We focus on the case 
where we approximate all NUBFs with the zero function. This corresponds to ignor- 
ing the AND operations in equation (8). This essentially says that in order to compute 
Compress);,,(A) = Compress;„(4,0) for a 64-word A = (Ao,...,A63), we map 
(Ag, ---,;A54,455,---, Ags) = O|/A = (0,...,0, Ao,..., Ags) into the 16 output 
words (A‘g,473,--»>Al6r-+sg) according to the linear recursion 


Aiye9 = Lra (Aj B A472): (9) 


For a given A, the function I” is the concatenation of 16r words A$, 71 V Aige V 
A! sg V A4290 < i < 16r — 1. Therefore, the number of bit conditions equals 


16r—1 
y= 5 wt(Aii71 V Aitos V Aitss V Ait22)- (10) 
i=0 
Note that this equation compactly integrates cases 1 and 2 given in section 6.9.3.2 
of for counting the number of active AND gates. Algorithm 3 in the extended ver- 
sion of this article shows how the Condition function is implemented using equa- 
tions @), b and). 

Using a similar linear algebraic method to the one used in Section Afor CubeHash, 
we have found the collision difference of equation (IJ) for r = 16 rounds with a raw 
probability pa = 2~°°. In other words, A is in the kernel of Compress; and the 
condition function has y = 90 output bits. Note that this does not contradict the proven 
bound in [23]: one gets at least 26 active AND gates. 


F6D164597089C40E i=2 
A; = 4 2000000000000000 i = 36 (11) 
0 0 <i < 63,14 2,36 


In order to efficiently find a conforming message pair for this differential path we need 
to analyze the dependency table of its condition function. Referring to our notations 
in SectionB our analysis of the dependency table of function Condition, (M, 0) at 
word level (units of u = 64 bits) shows that the partitioning of the condition function 
is as in Table Ø] for threshold value th = 0. For this threshold value clearly p; = 0. 
The optimal values for q;’s (computed according to the complexity analysis of the same 
section) are also given in Table 4] showing a total attack complexity of 230-6 (partial) 
condition function evaluatiorf. By analyzing the dependency table with smaller units 
the complexity may be subject to reduction. 

A collision example for r = 16 rounds of f can be found in the full version (QJ. 
Our 16-round colliding pair provides near collisions for r = 17, 18 and 19 rounds, 
respectively, with 63, 144 and 270 bit differences over the 1024-bit long output of f. 
Refer to to see how collisions for reduced-round f can be turned into collisions for 
reduced-round MD6 hash function. The original MD6 submission mentions inver- 
sion of the function f up to a dozen rounds using SAT solvers. Some slight nonrandom 
behavior of the function f up to 33 rounds has also been reported [I]. 


> By masking Mss and Mss respectively with O92E9BA68F763BF1 and 
DFFBFF7FEFFDFFBF after random setting, the 35 condition bits of the first three 
steps are satisfied for free, reducing the complexity to 2°°-° instead. 
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Table 4. Input and output partitionings of the Condition function of MD6 with r = 16 rounds 


{Mss 
{Mo, Ms, Mae, Ms2, Msa} 
{M; |j = 3, 4, 6, 9, 21, 36, 39, 40, 42, 45, 49, 50, 53, 56, 57} 
{Ma1, Ms1, Mss, M59, Meo} 


{M;i = 1; 2;7;8; 10,11, 12, 17, 18, 20, 22, 24, 25, 26, 29, 
33, 34, 37, 43, 44, 47, 48, 61, 62, 63} 
{M27} 
{Mis, Mie, M23} 


35 
{Mia, Mis, Mio, M28} 
{M30, M31, M32} 


7 Conclusion 


We presented a framework for an in-depth study of linear differential attacks on hash 
functions. We applied our method to reduced round variants of CubeHash and MD6, 
giving by far the best known collision attacks on these SHA-3 candidates. Our results 
may be improved by considering start-in-the middle attacks if the attacker is allowed to 
choose the initial value of the internal state. 
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Abstract. In this paper, we present preimage attacks on up to 43- 
step SHA-256 (around 67% of the total 64 steps) and 46-step SHA-512 
(around 57.5% of the total 80 steps), which significantly increases the 
number of attacked steps compared to the best previously published 
preimage attack working for 24 steps. The time complexities are 2751-9, 
2509 for finding pseudo-preimages and 27°+°, 2511-5 compression func- 
tion operations for full preimages. The memory requirements are mod- 
est, around 2° words for 43-step SHA-256 and 46-step SHA-512. The 
pseudo-preimage attack also applies to 43-step SHA-224 and SHA-384. 
Our attack is a meet-in-the-middle attack that uses a range of novel 
techniques to split the function into two independent parts that can be 
computed separately and then matched in a birthday-style phase. 


Keywords: SHA-256, SHA-512, hash, preimage attack, 
meet-in-the-middle. 


1 Introduction 


Cryptographic hash functions are important building blocks of many secure sys- 
tems. SHA-1 and SHA-2 (SHA-224, SHA-256, SHA-384, and SHA-512) [I] are 
hash functions standardized by the National Institute of Standards and Tech- 
nology (NIST) and widely used all over the world. However, a collision attack 
on SHA-1 has been discovered recently by Wang et al. [2]. Since the structure of 
SHA-2 is similar to SHA-1 and they are both heuristic designs with no known 
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security guarantees or reductions, an attack on SHA-2 might be discovered in 
the future too. To avoid a situation when all FIPS standardized functions would 
be broken, NIST is currently conducting a competition to determine a new hash 
function standard called SHA-3 [B]. From the engineering viewpoint, migration 
from SHA-1 to SHA-3 will take a long time. SHA-2 will take an important role 
during that transitional period. Hence, rigorous security evaluation of SHA-2 
using the latest analytic techniques is important. 

NIST requires SHA-3 candidates of n-bit hash length to satisfy a several 
security properties [3], first and foremost 


— Preimage resistance of n bits, 

— Second-preimage resistance of n — k bits for any message shorter than 2* 
blocks, 

— Collision resistance of n/2 bits. 


NIST claims that the security of each candidate is evaluated in the environment 
where they are tuned so that they run as fast as SHA-2 f. It seems that NIST 
tries to evaluate each candidate by comparing it with SHA-2. However, the 
security of SHA-2 is not well understood yet. Hence, the evaluation of the security 
of SHA-2 with respect to the security requirements for SHA-3 candidates is also 
important as it may influence our perspective on the SHA-3 speed requirements. 

SHA-256 and SHA-512 consist of 64 steps and 80 steps, respectively. The first 
analysis of SHA-2 with respect to collision resistance was described by Mendel 
et al. [B], which presented the collision attack on SHA-2 reduced to 19 steps. 
After that, several researches have improved the result. In particular, the work 
by Nikolić and Biryukov improved the collision techniques [6]. The best collision 
attacks so far are the ones proposed by Indesteege et al. [7] and Sanadhya and 
Sarkar [8], both describing collision attacks for 24 steps. The only analysis of 
preimage resistance we are aware of is a recent attack on 24 steps of SHA-2 due 
to Isobe and Shibutani P]. 

One may note the work announced at the rump session by Yu and Wang [LQ], 
which claimed to have found a non-randomness property of SHA-256 reduced 
to 39 steps. Since the non-randomness property is not included in the security 
requirements for SHA-3, we do not discuss it in this paper. In summary, the 
current best attacks on SHA-2 with respect to the security requirements for 
SHA-3 work for only 24 steps. 

After Saarinen [[]] and Leurent showed examples of meet-in-the-middle 
preimage attacks, the techniques for such preimage attacks have been developed 
very rapidly. Attacks based on the concept of meet-in-the-middle have been re- 
ported for various hash functions, for example MD5 [13], SHA-1, HAVAL [14], 
and so on SOLIS]. The meet-in-the-middle preimage attack is also applied 
to recently designed hash function ARIRANG [I9], which is one of SHA-3 can- 
didates, by Hong et al. [20]. However, due to the complex message schedule in 
SHA-2, these recently developed techniques have not been applied to SHA-2 yet. 


Our contribution. We propose preimage attacks on 43-step SHA-256 and 46- 
step SHA-512 which drastically increase the number of attacked steps compared 
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to the previous preimage attack on 24 steps. We first explain various attack 
techniques for attacking SHA-2. We then explain how to combine these tech- 
niques to maximize the number of attacked steps. It is interesting that more 
steps of SHA-512 can be attacked than of SHA-256 with so-called partial-fixing 
technique proposed by Aoki and Sasaki [15]. This is due to the difference of the 
word size as functions g and X mix 32-bit variables in SHA-256 more rapidly 
than in the case of double-size variables in SHA-512. 

Our attacks are meet-in-the-middle. We first consider the application of the 
previous meet-in-the-middle techniques to SHA-2. We then analyse the message 
expansion of SHA-2 by considering all previous techniques and construct the 
attack by finding new independent message-word partition, which is the funda- 
mental part of this attack. 

Our attacks and a comparison with other results are summarized in Table J 


Table 1. Comparison of preimage attacks on reduced SHA-2 


Reference Target |Steps Complexity Memory 
Pseudo- ae (approx.) 


not given 
words 


Outline. In Section] we briefly describe SHA-2. Section] gives an overview of 
the meet-in-the-middle preimage attack. In Section H] we describe all techniques 
of our preimage attack. Then Sections hJ]andGJexplain how these techniques can 
be applied together to mount an attack on SHA-256 and SHA-512, respectively. 
In Section[J we put some remark on our attack. Section BJ] concludes this paper. 


2 SHA-2 Specification 


Description of SHA-256. In this section we describe SHA-256, consult [I] for 
full details. SHA-256 adopts the Merkle-Damgard structure BI] Algorithm 9.25]. 
The message string is first padded with a single “1” bit, appropriate number of 
zero bits and then 64-bit length of the original message so that the length of the 
padded message is a multiple of 512 bits and then divided into 512-bit blocks, 
(Mo, Mj,..., My-1) where M; € {0, ie, 

The hash value hy is computed by iteratively using the compression function 
CF, which takes a 512-bit message block and a 256-bit chaining variable as the 
input and yields an updated 256-bit chaining variable as the output, 
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ho T= TV; (1) 
hid = CF (hi, Mi) (i = 0, 1; os aN = 1); 


where IV is a constant value defined in the specification. 

The compression function is based on the Davies-Meyer mode [2]] Algorithm 
9.42]. It consists of a message expansion and a data processing. Let >” and >” 
denote the x-bit right shift and rotation, respectively. First, the message block 
is expanded by the message expansion function, 


Mi for0<i< 16, 
oı(Wi—2) + Wi—-7 + oo(Wi-15) +W;-16 for 16< i< 64. 


where (mo,™m1,...,™15) — Mi (m; € {0,1}°?) and “+” denotes addition mod- 
ulo 2w°rd-size, In SHA-256 the word size is 32 bits. Functions co(X) and o1(X) 
are defined as 

o0(X) = (X>?) @ (X>!) a (X>3), 

o1(X) es (ye) a) (X>) a) am). (3) 


where “9” stands for bitwise XOR. operation. 

Let us use p; to denote a 256-bit value consisting of the concatenation of eight 
words A;, Bj, Cj, Dj, Ej, Fj, Gj and Hj. The data processing computes hj+1 as 
follows. 

Po — hi, 
Pj+ı — Rj(pj,Wj), (j =0,1,..., 68) (4) 
hisa — hi + pea, 


Step function Ry is defined as follows 


TY) — H; + 31 (E;) + Ch(E;, Fj, G;) + Kj + Wy, 

TS? — Xo(A;) + Maj(A;, Bi, C3), 

Ajai = TO + TY), Bj4ı — Aj, Cj — Bj, Djy — Cj, 
Ej = D+ TY, Fyn Ej Gyr Bj, Hyg Gi. 


(5) 


Above, K; is a constant, different for each step, and the following functions are 
used 


Ch(X,Y, Z) — (X VY) @ (CX) V Z), 
Mai(X, Y,Z)=(XvY)ə(XvZ)ə(Y vZ), 
Eq(X) — (X72) @ (X13)  (X>2), = 
5, (X) — OO) @ (XN) @ (xX>25), 


where ~ means bitwise negation of the word. 


Description of SHA-512. The structure of SHA-512 is basically the same as 
SHA-256. In SHA-512, the word size is 64 bits, double of SHA-256, hence, the 
message-block size is 1024 bits and the size of chaining variable pj is 512 bits. 
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The compression function has 80 steps. Rotation numbers in 09,01, Xo, and X1 
are different from those used in SHA-256, which are shown below. 


oo(X) — ( 
o1(X) — (X>) @ (Xl) @ (X>*), s 
Xo(X) ge (Xe fay (Xx?) D (X33); ( ) 
BUX) — (X>14) @ (X>!8) @ (X>4), 


3 Overview of the Meet-in-the-Middle Preimage Attack 


A preimage attack on a narrow-pipe Merkle-Damgard hash function is usually 
based on a pseudo-preimage attack on its underlying compression function, where 
a pseudo-preimage is a preimage of the compression function with an appro- 
priate padding. Many compression functions adopt Davies-Meyer mode, which 
computes E„(v) ® v, where u is the message, v is the intermediate hash value 
and F is a block cipher. 

First we recall the attack strategy on a compression function, which has been 
illustrated in Fig. [] Denote by h the given target hash value. The high-level 
description of the attack for the simplest case is as follows. 


1. Divide the key u of the block cipher E into two independent parts: u, and 
u2. Hereafter, independent parts are called “chunks” and independent inputs 
uı and uz are called “neutral words”. 

2. Randomly determine the other input value v of the block cipher E. 

3. Carry out the forward calculation utilizing v and all possible values of u1, 
and store all the obtained intermediate values in a table Tp. 

4. Carry out the backward calculation utilizing h @ v and all possible values of 
ug, and store all the intermediate values in a table Tp. 

5. Check whether there exists a collision between Tp and Tg. If a collision 
exists, a pseudo-preimage of h has been generated. Otherwise, go to Step 2. 


The main novelty of the meet-in-the-middle preimage attacks is, by utilizing 
independence of uı and uz of the key input, transforming the problem of find- 
ing a preimage of h to the problem of finding a collision on the intermediate 
values, which has a much lower complexity than the former one. Suppose there 


Fig. 1. Meet-in-the-middle attack strategy on a Davies-Meyer compression function 
E,(v) v 
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are 2° possible values for each of u; and ug. Using 2° compression function com- 
putations, the attacker obtains 2’ elements in each of Tp and Tg. The collision 
probability is roughly 2%”, where n is the bit length of h, much better than the 
probability 2°~” of finding a preimage by a brute force search with complexity 2°. 


4 The List of Attack Techniques 


This section describes the list of techniques used in the attack. Some of them 
were used before in previous meet-in-the-middle attacks [5E3606]. We explain 
them here first and then in Sections Bland E] we show how to combine them in 
an attack on SHA-2. 


4.1 Splice-and-Cut 


The meet-in-the-middle attack starts with dividing the key input into two in- 
dependent parts. The idea of splice-and-cut is based on the observation made 
in that the last and first steps of the block cipher E in Davies-Meyer mode 
can be regarded as consecutive by considering the feed-forward operation. 

This allows the attacker to choose any step as the starting step of the meet- 
in-the-middle, which helps with finding more suitable independent chunks. 

This technique can find only pseudo-preimages of the given hash value instead 
of preimages. However, pseudo-preimages can be converted to preimages with a 
conversion algorithm explained below. 


4.2 Converting Pseudo-preimages to Preimages 


In ax-bit iterated hash functions, a pseudo-preimage attack with complexity 
2¥,y < x — 2 can be converted to a preimage attack with complexity of gat 
BI Fact9.99]. The idea is applying the unbalanced meet-in-the-middle attack 
with generating 2(*-¥)/? pseudo-preimages and generating 2(¢+¥)/? 1-block 
chaining variables starting from IV. 


4.3 Partial-Matching 


The example in Fig. [is the simplest and optimistic case. In fact, in the previous 
attacks, the key input cannot be divided into just two independent chunks. 
Usually besides the two independent chunks wu; and ug, there is another part, 
which depends on both u; and u2. Hence, the stored intermediate values in Tp 
and Tg are ones at different steps. This raises a problem: how the values in Tp 
and Tg can be compared. However, many hash functions, including SHA-2, have 
Unbalanced Feistel Network structure, where the intermediate values will only be 
updated partially at one step. This means that a part of the intermediate values 
does not change during several steps and the attacker can check the match of 
two values partially. 
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Consider SHA-2, assume one chunk produces the value of p; and the other 
chunk produces the value of pj+s. The attacker wants to efficiently check whether 


or not pj and pj+s match without the knowledge of W;,Wj41,...,Wj4s—1- In 
SHA-2, the maximum number of s is 7. 
Assume the value of pj+7 = Aj+7||Bj+7||--- ||Hj+7 is known and W;+6 is un- 


known. By backward computation, we can obtain the values of A;+6, Bj+6,..-, 
Gj+6- This is because Aji¢6, Bj+6, Cj+6, Ej+6, Fj+6, and Gj+6 are just copies of 
corresponding values in pj+7 and Dj+6 is computed as follows. 


Dj+6 — Ej+7 — (Aj+7 — (20( Bj 47) + Maj( B47, C547, Dj+7))). (8) 


By repeating the similar computation, in the end, A; is computed from pj;+7 
without the knowledge of W;,Wj+1,...,W j46. Note that this technique was 
already used (but not explicitly named) in [9]. 


4.4 Partial-Fixing 


This is an extension of the partial-matching technique that considers parts of 
registers of the internal state. It increases the number of steps that can exist 
between two independent chunks. Assume that the attacker is carrying out the 
computation using u; and he is facing a step whose key input depends on both 
u, and ug. Because the computation cannot go ahead without the knowledge 
of ug, the chunk for u; must stop at this step. The partial-fixing technique is 
partially fixing the values of u and uz so that we can obtain partial knowledge 
even if the full computation depends on both uy; and ug. 

The partial-fixing technique for SHA-2 has not been considered previously. 
Assume we can fix the lower x bits of the message word in each step. Under this 
assumption, 1 step can be partially computed easily. Let us consider the step 
function of SHA-2 in the forward direction. Equations using W; is as follows. 

TË — H; + 51(E;) + Ch(E;, Fj, Gj) + K; Wy, (9) 
Aj4i —— TY) + TY), IS t= D; + TY), 


If the lower x bits of W; are fixed, the lower x bits of Aj41 (and E;+41) can be 
computed independently of the upper 32 — x bits of W;. Let us consider to skip 
another step in forward direction. The equation for A;+2 is as follows: 


Ajy = TY*Y + Dof Aj41) + Maj(Aj41, Biti, Cy41)- (10) 


We know only the lower x bits on Aj+1. Hence, we can compute Maj function for 
only the lower x bits. How about the Xo function? We analysed the relationship 
of the number of consecutive fixed bits from LSB in the input and output of 
00,01, Xo, and X1. The results are summarized in Table J] 

From Table B] if x is large enough, we can compute the lower x — 22 bits of 
Aj+2 in SHA-256 and the lower x — 39 bits in SHA-512, though the number 
of known bits is greatly reduced after the Xọ function. This fact also implies 
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Table 2. Relationship of number of consecutive fixed bits from LSB in input and 
output of o and X 


SHA-256 SHA-512 


output|z — 22 x — 25 x — 18 x — 19|x — 39 x — 41 r — 8 x 


When zx agrees with the word size, the output is x. When the number described in the 
output is negative, the output is 0. 


that we cannot obtain the value of Aj+3 since the number of fixed bits will be 
always 0. In the end, the partial-fixing technique can be applied for up to 2 
steps in forward direction. Similarly, we considered the partial-fixing technique 
in backward, and found that it can be applied up to 6 steps. 

However we have another problem in the first assumption; the lower x bits 
of each message word can be fixed. This is difficult to achieve because the fixed 
bits in message words are mixed by the o function in the message expansion. 
In fact, we could apply the partial-fixing technique for computing only 1 step in 
forward, and only 2 steps in backward for SHA-256. However, in SHA-512, the 
bit-mixing speed of ø is relatively slow due to the double word size. In fact, we 
could compute 2 steps in forward, and 6 steps in backward. Finally, 10 steps in 
total can be skipped by the partial-matching and partial-fixing techniques for 
SHA-256, and 15 steps for SHA-512. (These numbers of steps are explained in 
Sections BJ and @) 


4.5 Indirect-Partial-Matching 


This is another extension of partial-matching. Consider the intermediate values 
in Tp and Tg. We can express them as functions of u and u2, respectively. If the 
next message word used in forward direction can be expressed as %1 (u1) + ¢2(u2) 
and computation of chaining register at the matching point does not destroy this 
relation (because the message word is also added), the matching point can still be 
expressed as a sum of two independent functions of u1, u2, e.g. Yr (u1) + Er(ug). 
Similarly, we can express the matching point from backward as wg(u1)+&p(u2), 
and we are to find match. Now, instead of finding a match directly, we can 
compute Wr(u1) — ~e(u1) in forward direction and £g (u2) — €r(u2) in backward 
direction independently and find a match. 

In case of SHA-2, it is possible to extend the 7-step partial-matching to 9-step 
indirect-partial-matching by inserting one step just before and after the partial 
matching. 

Note this technique can be combined with partial-fixing technique by apply- 
ing them in order: partial-fixing, partial-matching and indirect-partial-matching. 
However, there are some constraints that need to be satisfied, such as the inde- 
pendence of message word used in indirect-partial-matching, while we need to 
be able to compute enough bits at the matching point in order to carry out the 
partial-matching efficiently. 
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4.6 Initial Structure 


In some cases, the two independent chunks u1 and us will overlap with each other. 
The typical example is that the order of the input key of E is ujugu uz. This 
creates a problem: how should the attacker carry out the forward and backward 
computations independently. The Initial Structure technique was proposed by 
to solve such a problem. Previous attacks usually set a certain step as the 
starting step, then randomly determine the intermediate value at that step, and 
carry out the independent computations. However, the initial structure technique 
sets all the steps of w2u, in the middle of uju2uiu2 together as the starting 
point. Denote the intermediate values at the beginning and last step of ugt1 
as I, and I, respectively. For each possible value of u1, the attacker can derive 
a corresponding value J;. Similarly, for each possible value of u2, the attacker 
can derive a corresponding value I2. Moreover, any pair (J), u1) and (I2, u2) can 
be matched at the steps of usu of uuzu1u2. Thus, the attacker can carry out 
independent computations utilizing (11, u1) and (I2, u2). 

Initial structure for SHA-2 makes use of the absorption property of the func- 
tion Ch(a, y, z) = cy ® (=x)z. If x is 1 (all bits are 1), then Ch(1, y, z) = y which 
means z does not affect the result of Ch function in this case; similarly when x 
is O (all bits are 0), y does not affect the result. When we want to control partial 
output (few bits), we need to fix the corresponding bits of x instead of all bits 
of g. 

We consider 4 consecutive step functions, i.e. from step 7 to step i+ 3. We 
show that, under certain conditions, we can move the last message word W;+3 
to step į and move W; to step i+ 1 while keeping the final output after step i+ 3 
unchanged. 

Assume we want to transfer upwards a message word W;3. Due to the ab- 
sorption property of Ch, we can move W;+3 to step i + 2 (adding it to register 
Gi+2) if all the bits of E;,2 are fixed to 1. This is illustrated in Fig. B] (left). 
Similarly, we can further move W;+3 to step i + 1 (adding it to register Fi+1) if 
all the bits of E;,1 are 0. Then, we still can move it upwards by transferring it 
to register Æ; after step transformation in step i. 

The same principle applies if we want to transfer only part of the register 
Wi+3. If l most significant bits (MSB) of W;,3 are arbitrary and the rest is set 
to zero (to avoid interference with addition on least significant bits), we need to 
fix 1 MSB of Ei+2 to one and l MSB of Fj, to zero. 

As | MSB of F;41 need to be 0, we need to use | MSB of W; to satisfy this 
requirement. This reduces the space of W; to 2°?~'. Similarly, we need to choose 
those W; that fix 1 MSB of Ei+2 to one. This is possible because changing the 
value of W; influences the state of register Ej42 through X; at step i+ 1. We 
experimentally checked that changing W; generates changes in Fi+2 that are 
sufficiently close to uniformly distributed. Satisfying additional constraints on | 
bits further reduces the space of W; to 23272, 

The important thing to note here is that if we fix the values of Fii1, Gi+1 
and of the sum D;,; + Hi+ı we can precompute the set of good values for W; 
and store them in a table. Then, we can later recall them at negligible cost. 
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Fig. 2. Initial structure for SHA-2 allows to move the addition of Wi+3 upwards pro- 
vided that the Ch functions absorb the appropriate inputs (left); move W; one step 
downwards (right) 


On the other hand, message word W; can be moved to step i + 1 with no 
constraint, as shown in Fig. 2] (right). 
This procedure essentially swaps the order of words W; and W443. 


4.7 Two-Way Expansion 


Message expansion usually works in such a way that some consecutive several 
messages can determine the rest. For SHA-2, any consecutive 16 message words 
can determine the rest since the message expansion is a bijective mapping. This 
enables us to control any intermediate 16 message words and then expand the 
rest in both ways. This technique gives us more freedom of choices of neutral 
words, and extends the number of steps for the two chunks a lot. Note that the 
maximum number of consecutive steps for the two chunks is 30 for SHA-2. Since 
the message expansion is a bijective mapping, no matter which neutral word 
is chosen, it must be used to compute at least one of the any consecutive 16 
message words. So each chunk of consecutive steps is of length at most 15. 


4.8 Message Compensation 


For some choice of neutral words, two chunks are not able to achieve the optimal 
length. By forcing some of the other message words to cancel the change intro- 
duced by neutral words, the optimal or near-optimal length could be achieved. 
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Combining the initial structure, two-way expansion and message compensa- 
tion techniques, we are able to find two chunks of length 33. We choose to control 
on {W.,...,Wz+15}, for some z which we will determine later. We choose W245 
and W.+s as neutral words. We show the first chunk {W-_10,..., Wz44, W+8} 
to be independent from W,+5 and second chunk {W.45, W2+6,W2+47,Wae+9,---; 
W422} to be independent from W,+8. Note that W.+s is “moved” to first chunk 
by method explained in initial structure. For forward direction, we need to show 
{W.-10,..-, Wz-1} are independent from W-,,5 when they are expanded from 
{W., sans W415}. 


Wz-1 = Wz+15 — 0 
W,-2 = Wz+14 0 


(Wz+13) — Wz+8 — oo (W2) , 
( 
W,-3 = Wz413 — o1 (Wz411 
( 
( 
( 


— Wz47 — oo(Wz—1) , 
— Wz+6 — co(Wz—2) , 


11 
12 
13 
14 
15 
16 
17 
18 
19 
20 


W.-4 = W2412 — 0 
Wz-5 = Wz+11 > 0 
Wz-6 = Wz+10 — 0 
Wz—7 = W249 — o1(Wz4+7) — Wz+2 — O0 ( 
W.-8 = W248 — 01(W246) — Wz+1 — G0 
Wz-9 = Wz+7 — 01 
Wz—10 = Wz+6 — o1(Wz+4) — Wz—1 — oo(Wz-9) . 


6) 


Wz— 
W.-7) , 


(11) 
(12) 
(13) 
(14) 
(15) 
(16) 
(17) 
(18) 
(19) 
(20) 


We note that Wz+5 is used in (9) and (4), we compensate them by using Wz+7 
and W,412. By “compensating” we mean making the equation value independent 
from W.45 by forcing W,+47 — o1(W.45) = C (C is some constant, we use 0 for 
simplicity) and Wz412 — W.45 = C. W247 is also used in (I, however we can 
use W.49 to compensate for it, i.e. set We49 = 01(W247) = 07(W245). Then 
W.49 and W,412 are used in steps above, so we continue this recursively and 
finally have the following constraints that ensure the proper compensation of 
values of W.45. 


W247 = o1(Wz+5) , 

Wz+9 = 07 (Wes) , 

W.4i1 = 02 (Weis) , 

W.413 = oi (Wz45) , (21) 
Wz+15 = of (Weis) , 

Wz+12 = Wess , 


W.414 =201(W245) . 


The second chunk is independent from W,4g automatically without any com- 
pensation. The 33-step two-chunk is valid regardless of the choice of z as long 
as z > 10. To simplify the notation, we use W;,...,W +432 to denote the two 
chunks, then Wj415 and W;+18 are the two neutral words. We reserve the final 
choice of 7 for later to pick the one that allows to attack the most steps, as 
described later. 
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5 Preimage Attack against 43 Steps SHA-256 


5.1 Number of Attacked Steps 


The attack on SHA-256 uses 33-step two-chunk W,,...,W j+432 explained in 
Section H] Hence, in forward direction, pj+33 can be computed independently of 
the other chunk and in backward direction, p; can be computed independently 
of the other chunk. We extend the number of attacked steps as much as possible 
with partial-fixing (PF) and indirect-partial-matching (IPM) techniques. 


Forward computation of Aji34: The equation for Aj+34 is as follows. 


Aj+34 = Xo(Aj+33) + Maj(Aj+33, Bj+33, Cj+33) + Aj+ss 
+ 31 ( E3433) + Ch( E5433, Fj+33, Gj+33) + Kj+33 + Wj+33, 


W433 = 01(Wj+431) + Wj+426 + o0(W5418) + W5417 


We can use either PF or IPM to compute Aj+34. If we use PF, we fix the 
lower | bits of Wj+41s, which is a neutral word for the other chunk. According 
to Table 2] this fixes the lower l — 18 bits of o9(Wj+18). Finally, the lower 
|—18 bits of A;+34 can be computed. If we use IPM, we describe Aj+34 as a 
sum of functions of each neutral words i.e. Aj+34 = Wr(Wj415) +Er(Wj+18)- 
From the above equations, they can be easily done. Note that IPM is more 
efficient than PF with respect to only computing Aj+34 because IPM does 
not need to fix a part of neutral word. 
Forward computation of Aji35: The equation for Aj+35 is as follows. 


Aj435 = Xo(Aj+34) + Maj(Aj+34, Bj434,Cj434) +--+ Wiss, 
W434 = oı(Wj+32) T War T oo(Wj+19) + W3+18 


Neither PF nor IPM can compute Aj+35. If we used PF for Aj+34, only the 
lower | — 18 bits are known. This makes all bits of Aj+35 unknown after 
the computation of Xo(Aj+34). If we used IPM, Aj434 is described as a 
sum of two independent functions. However, because Xo consists of XOR. of 
three self-rotations, it seems difficult to describe Xo(A;j+34) as a sum of two 
independent functions. 


In summary, we can skip only 1 step in forward. In this case, using IPM is more 
efficient than using PF. 


Backward computation of H;_;: The equation for H;_1 is as follows. 


Hj- = A; — (2o(B;) + Maj(B;, C;,.D;)) 
= X (F}) = Ch(F}, Gy, Hj) — Kj-1 — Weed, 
Wj-1 = W415 a oı(Wj+13) = Wj+s F o0(W;) 


We can use either PF or IPM to compute Hj_ 1. If we use PF, we fix the 
lower l bits of Wj+415, and then, the lower / bits of Hj—ı can be computed. 
If we use IPM, we describe H;_1 as a sum of functions of each neutral word. 
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Fig. 3. Separation of chunks and dependencies of state words for SHA-256 


Backward computation of H;_2: The equation for H;_2 is as follows. 
Hy—2 = Aj—1 — (X0(Bj-1) + Maj(Bj—1, Cj-1, Dj-1)) 
— 34 (Fj-1) — Ch(Fj-1, Gj-1, Hj-1) — Kj-2 — Wj-2, 
Wj-2 = Wj414 — 01(W3412) — Wj47 + 90(W3-1) 


We can use PF to compute H;—2 but cannot use IPM. To describe Ch(Fj-1, 
G1, Hj—1) and oo(W;_1) as a sum of two independent functions seems diffi- 
cult. If we used PF for H;_1, we can obtain the lower l bits of Ch(F}j—1, Gj-1, 
H—1) and lower | — 18 bits of o9(W;-_1). Finally, we can compute the lower 
l — 18 bits of Aj_2. 


By the similar analysis, we confirmed that we cannot compute H;_3. In summary, 
we can skip 2 steps in backward with PF which fixes the lower l,l > 18 bits of 
Wy+15- 

The attack uses 33-step two-chunk W;,...,Wj+432 including 4-step initial 
structure. Apply PF for Wj—ı and Wj_2, and apply IPM for W434. Finally, 
43 steps are attacked by skipping additional 7 steps using partial-matching 
technique. 

36 steps (Wj—2 to Wj+34) must be located sequentially. We have several op- 
tions for j. We choose j = 3 for the following two purposes; (1) W13, Wi4, and 
Wıs can be freely chosen to satisfy message padding rules, (2) pseudo-preimage 
attack on SHA-224 is possible (explained in Section [J). 

We need to fix the lower /+ 18 bits of Wis to fix the lower | bits of W2 by PF. 
Besides, we lose half of remaining freedom to construct 4-step initial structure. 
Hence, we choose l to balance l — 18 and soot i.e. we choose | = 23. 

The overview of the separation of chunks is shown in Fig. denotes 
variables depending only on W21; Œ denotes variables depending only on Wj\3; 
A and P denote registers that can be expressed as a sum modulo 23? of two 
independent functions of neutral variables Wig and W21; & denotes registers 
with few bits depending only on W21; X denotes registers depending on both 
Wig and W2 in a complicated way. 
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5.2 Attack Procedure 


1. Randomly choose the values for internal chaining variable pi9 (after the 
movement of message words by initial structure) and message word Wig. 
Randomly fix the lower 23 bits of Wig. By using the remaining 9 free bits 
of Wig, find 2° values on average that correctly construct the 4-step initial 
structure, and store them in the table Tw. Let us call this an initial table- 
preparation. 

2. Randomly choose message words not related to initial structure and neutral 
words, i.e. Wis, Wia, Wis, Wie, Wiz, Wo3. Let us call this an initial 
configuration. 

3. For all 2° possible Wig in Tw, compute the corresponding W20, W22, W24, 
W25, W26, W27, Wog as shown in equations (ZI). Compute forward and find 
Wr(Wisg). Store the pairs (Wig, Wr(Wis)) in a list Dp. 

4. For all 24 possible values (the lower 4 bits) of W21, compute backward and 
find £p (W21), which is o9(W21) in this attack, and the lower 4 bits of A37. 

5. Compare the lower 4 bits of A37 — 09(W21) and the lower 4 bits of Wr(Wis) 
stored in Dp. 

6. If a match is found, compute A37, B37,..., 137 with the corresponding Wisg 
and W.2, and check whether results from both directions match each other. 
If they do, output po and Wo,..., W15 as a pseudo-preimage. 

7. Repeat steps 2 — 6 for all possible choices of W13, Wig, Wi7, W21. Note, the 
MSB of Wj; is fixed to 1 to satisfy message padding. Hence, we have 2127 
freedom for this step. 

8. If no freedom remains in step 7, repeat steps 1 — 7. 

9. Repeat steps 1 — 8 24 times to obtain 24 pseudo-preimages. Then, convert 
them to a preimage according to BI] Fact9.99]. 


5.3 Complexity Estimation 


We assume the complexity for 1 step function and 1-step message expansion 
is a compression function operation of 43-step SHA-256. We also assume that 
the speed of memory access is negligible compared to computation time for step 
function and message expansion. Complexity for step [is 2° and use a memory 
of 2° words. Complexity for step Blis negligible. In step B] we compute pj41 — 
R;(p;,W;) for j = 18,19,...,36 and corresponding message expansion. Hence, 
the complexity is 2° Wer use a memory of 2° x 2 words. Similarly, in step] we 
compute pj — fo: W,) for 7 = 20,19,...,2 and 6 more steps for partial- 
fixing and partial-matching. Hence, the aani is 2425 25 . In step] we compare 
the match of lower 4 bits of 2°(= 24 - 2°) items. Honde, OP wesults wil reman. 
Complexity for step El is 2s : and the probability that all other bits match is 
27252, Hence, the number of remaining pair becomes 2—747(= 25 . 27252), So far, 
the compleri from stepB]to Blis 2548+242% +25 £ = 25325 ~ 24-878, In step 
this is repeated 2!?7 times and its doka | is ordi. 878. Step Blis computed 2120 
times. This takes 21° . (29 + 2131-878) ~ 2251.9, This is the complexity of the 
pseudo-preimage attack on SHA-256 43-steps. Finally, at Step B] preimages are 
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found with a complexity of 2!+(251-878+256)/2 — 9254.939 ~ 2254.9. The required 
memory for finding a pseudo-preimage is 2° words and 2° x 2 words in Steps [I] 
and B] which is 2° x 3 words. For finding a preimage, we need to store 21-9 
pseudo-preimages for unbalanced meet-in-the-middle. This requires a memory 
of 21-9 x 24 words. 


5.4 Attack on 42 Steps SHA-256 


When we attack 42 steps, We use 1-step IPM instead of 2-step PF in backward. 
This allows the attacker to use more message freedom. We choose | = 10 so 
that l and 32l are balanced. Because each chunk has at least 10 free bits, the 
complexity for finding pseudo-preimages is approximately 2246(= 2756 . 2710), 
The precise evaluation is listed in Table J 


6 Preimage Attack against 46 Steps SHA-512 


6.1 Basic Strategy for SHA-512 


For SHA-512, we can attack more steps than SHA-256 by using PF. This occurs 
by the following two properties; 


— Message-word size of SHA-512 is bigger than that of SHA-256. Hence, the 
bit-mixing speed of g and X functions are slower than SHA-256. 
— The choice of three rotation numbers for the oo function is very biased. 


To consider the above, we determine to use the message freedom available to the 
attacker for applying PF as much as possible. 

Construction of the 4-step initial structure explained in Section H] consumes 
a lot of message freedom. Therefore, we do not use the 4-step initial structure 
for SHA-512. Construction of the 3-step initial structure also needs a lot of 
message freedom. On the other hand, 2-step initial structure does not consume 
any message freedom because we do not have to control Ch functions. Finally, in 
our attack, we use a 31-step two-chunk including 2-step initial structure. Because 
construction of 2-step initial structure is much simpler than that of 4-step initial 
structure, we omit the detailed explanation of the construction. 


6.2 Chunk Separation 


The 31 message words we use are Wj to Wj+30. We apply the 2-step initial 
structure for W;,15 and Wj+16, hence the neutral words for the first chunk is 
W416 and for the second chunk is W415. Whenever we change the value of 
W;+16, we change the values of W;+7,Wj+6,...,W; by message compensation 
technique so that the change does not impact to the second chunk. Similarly, 
whenever we change Wy+15, we change W417, Wj+19, Wj+21, W422, cawg Wj+30- 
Finally, W; to W;+30 can form the 31-step two-chunks. 
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6.3 Partial-Fixing Technique 


We skip 6 steps in backward and 2 steps in forward by PF. Namely, we need to 
partially compute Wj_1, W;-2,...,W;-6 independently of Wj+15, and partially 
compute Wj+31 and W;j+32 independently of Wj+16, The equations for these 
message words are as follows. 


Wy-1, = Wy+15, = 01(Wj+13) — Wj+8 + 0(W3), 
Wy-2, , = Wj414 — 01(Wj 412) — Wy47 + o0(Wj-1,); 
Wy-3,_1_ = W413 — 01(Wj411) — Wite + o0(Wj-2,_,)> 
Wy-4,_ 5, = Wi+12 — o1(W5+10) — Wis + o0(W5-3) 46) 
W3-5, 4) = Wi+11 — 01(W3+9) — Wija + oo(Wj-4 34) 
W5-6)_ aq = W410 = 01(W3+8) — Wita + oo(Wj-5,_ 33) 
Wi+31, 3 = o1(W3429) + Wj+24 + oo(W3+16,) + W515, 
Wy+32, = 01(W3+30) + Wy+25 + 90(Wj417) + W+s6,- 


Remember Table P] If the lower l bits of input of oo is fixed, we can compute 
the lower / — 8 bits of its output. In backward, if we fix the lower l bits of Wj+15, 
the lower l bits of W;_1, the lower l — 8 bits of Wj_2, the lower l — 16 bits of 
W,-3, the lower l — 24 bits of W;_4, the lower | — 32 bits of Wj_5, and the lower 
l — 40 bits of W;_¢ can become independent of the second chunk. This results in 
computing the lower l bits of H;_1, the lower |—8 bits of Hj_2, the lower /— 16 
bits of Hj—3, the lower l — 41 bits of H;j—4, the lower | — 49 bits of Hj_5, and 
the lower | — 57 bits of Hj—e. Note that we also need to consider X4 to compute 
Hj—4, Hj—5, and H;_¢. If we fix the lower l bits of W;+16, the lower | — 8 bits of 
W;+31, and the lower l bits of W;+32 can become independent of the first chunk. 
This results in computing the lower | — 8 bits of Aj+32, and the lower | — 47 bits 
of Aj+33- 

Therefore, if we choose l = 60, we can match the lower 3 bits of H; 6 and 13 
bits of Aj+33 after we skip 7 steps by the partial-matching technique. 


6.4 Attack Overview 


The attack uses 31-step two-chunk W,,...,W +430 including 2-step initial struc- 
ture. Apply PF for W;-1,Wj-2,...,W 6, and Wj+31, Wj+32. Finally, 46 steps 
are attacked by skipping additional 7 steps using partial-matching technique. 

39 steps (W;_¢ to Wj+432) must be located sequentially. Because Wj+8, Wj+9, 
Wj+10, W411; W412, W413; W414, Wy+18; Wj+20 are the message words we fix 
in advance, we choose j = 6 so that W14 and Wj; can be chosen to satisfy message 
padding rules. The MSB of W43 can also be satisfied. In this chunk separation, 
W;+7 can be described as W;47 = Const — W;+416, where Const is a chosen fixed 
value and the lower l bits of Wj+16 are fixed. If we fix Const and the MSB of 
W;+16 to 0 and some value, respectively, and choose the lower l bits of Wj+16 
so that the MSB of —W;+16 does not change for all active bits of Wj416, we can 
always fix the MSB of Wj+7. 
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The number of free bits in Wj+16 is 3. (1 = 60 but we fix the MSB for 
satisfying padding for W13.) The number of free bits in W415 is 4. Results 
from both chunks are compared with 3 bits. Therefore, the final complexity of 
pseudo-preimage attack is approximately 2°°°. This is converted to a preimage 
attack whose complexity is approximately 2511-5, For finding pseudo-preimages, 
this attack needs to store 2° items. Hence, the required memory is 23 x 9 words. 
For finding preimages, we need to store 215 pseudo-preimages for unbalanced 
meet-in-the-middle. This requires a memory of 215 x 24 words. 


6.5 Attack on 42 Steps SHA-512 


When we attack 42 steps, we stop using 1-step PF in forward and 3-step PF in 
backward. We choose | = 40. Because each chunk has at least 24 free bits, the 
complexity for finding pseudo-preimages is approximately 2488(= 2512 . 2724), 
The precise evaluation is listed in Table JJ 


7 Remarks 


7.1 Length of Preimages 


The preimages are of at least two blocks, last block is used to find pseudo- 
preimages and the second last block links to the input chaining of last block. 
Two block preimages is only possible if we can preset the message words used 
for encoding the length (m4 and m5 for SHA-2) of last block according to the 
padding and length encoding rules. In our case, this can be done in the first step 
of the algorithm. On the other hand, we can leave m14 and m15 as random, later 
we can still resolve the length using expandable messages BJ). 


7.2 SHA-224 and SHA-384 


Our attack on 43 steps SHA-256 can also produce pseudo-preimages for SHA- 
224 by using the approach by Sasaki [23]. In our attack, we match 4-bits of A37 
which is essentially equivalent to G43. Then, we repeat the attack until other 
registers randomly match i.e. we wait until A43, B43, . . . , F43, and H43 randomly 
match. In SHA-224, the value of H43 is discarded in the output. Hence, we do not 
have to care the match of H43, which results in decreasing the complexity by 2°? 
bits. Hence, pseudo-preimages of SHA-224 can be computed with a complexity 
of 2719-9(= 2251.9 . 2—32), Note, this cannot be converted to a preimage attack on 
SHA-224 because the size of intermediate chaining variable is 256 bits. 

If we apply our attack on SHA-512 to SHA-384, Wi13,Wj4, and Wis will 
depend on neutral words. Hence, we cannot confirm 46 steps SHA-384 can be 
attacked or not because of padding problem. However, 43 steps SHA-384 can be 
attacked by using the same chunk as SHA-256. By considering the difference of 
word size and application of PF, we can optimize the complexity by choosing 
l = 27 so that l — 8 and sil are balanced. 
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7.3  Multi-preimages and Second-Preimages 


We note that the method converting pseudo-preimage to preimages can be fur- 
ther extended to find multi-preimages. We find first k block multi-collisions 24], 
then follow the expandable message to link to the final block. This gives 2% 
multi-preimages with additional k2”/? computations, which is negligible when 
k is much smaller than 276/2 (t denotes number of bits for each chunk, refer 
to Section B). We need additional 128k bytes of memory to store the k block 
multi-collisions. Furthermore, most of the message words are randomly chosen, 
this attack naturally gives second preimages with high probability. Above multi- 
preimages are most probably multi-second preimages. 


8 Conclusions 


In this paper, we presented preimage attacks on 43 steps SHA-256 and 46 steps 
SHA-512. The time complexity of the attack for 43-step SHA-256 is 27°49 and 
it requires 2° - 3 words of memory. The time complexity of the attack for 46- 
step SHA-512 is 2511-5 and it requires 2? -9 words of memory. The number of 
attacked steps is greatly improved from the best previous attack, in other words, 
the security margin of SHA-256 and SHA-512 is greatly reduced. Because SHA- 
256 and SHA-512 have 64 and 80 steps, respectively, they are currently secure. 

An open question worth investigating would be to see if the current attacks 
may still be improved. Perhaps finding 15+ 4+ 15 pattern of chunks with 4-step 
initial structure in the middle or using better partial-fixing technique that would 
utilize middle bits of the message word would extend the attacks. 

The preimage attack we presented creates a very interesting situation for SHA- 
2 when a preimage attack, covering 43 or 46 steps, is much better than the best 
known collision attack, with only 24 steps. Our attack does not convert to collision 
attack because of the complexity above the birthday bound. However, we believe 
that the existence of such a preimage attack suggests that a collision attack of 
similar length could be also possible. In that light, the problem of finding collisions 
for reduced variants of SHA-256 definitely deserves more attention. 
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Abstract. We demonstrate how the framework that is used for creating 
efficient number-theoretic ID and signature schemes can be transferred 
into the setting of lattices. This results in constructions of the most ef- 
ficient to-date identification and signature schemes with security based 
on the worst-case hardness of problems in ideal lattices. In particular, 
our ID scheme has communication complexity of around 65, 000 bits and 
the length of the signatures produced by our signature scheme is about 
50, 000 bits. All prior lattice-based identification schemes required on the 
order of millions of bits to be transferred, while all previous lattice-based 
signature schemes were either stateful, too inefficient, or produced sig- 
natures whose lengths were also on the order of millions of bits. The 
security of our identification scheme is based on the hardness of finding 
the approximate shortest vector to within a factor of O(n?) in the stan- 
dard model, while the security of the signature scheme is based on the 
same assumption in the random oracle model. Our protocols are very 
efficient, with all operations requiring O(n) time. 

We also show that the technique for constructing our lattice-based 
schemes can be used to improve certain number-theoretic schemes. In 
particular, we are able to shorten the length of the signatures that are 
produced by Girault’s factoring-based digital signature scheme ( [0OMMB1]). 


1 Introduction 


The appeal of building cryptographic primitives based on the hardness of lattice 
problems began with the seminal work of Ajtai who showed that one-way func- 
tions could be built with security based on the worst-case hardness of certain 
lattice problems 2]. Unfortunately, cryptographic primitives that were built with 
this very strong security property were extremely inefficient for practical appli- 
cations. For example, evaluating one-way and collision-resistant hash functions 
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required O(n?) time and space PZJ], and in public-key cryptosystems, the keys 
were on the order of megabytes (also see for concrete parameter 
proposals for the scheme in [B4]). Therefore some new ideas were required in 
order to make provably-secure lattice-based primitives a realistic alternative to 
ones based on number-theory. 

A promising approach for improving efficiency is to use lattices that possess 
extra algebraic structure, and it is precisely this extra structure that makes the 
NTRU cryptosystem [[4] (which unfortunately does not have a proof of secu- 
rity) very efficient in practice. A step in the direction of building provably-secure 
lattice-based primitives was taken by Micciancio [23], who showed that one could 
build efficient (O(n) evaluation time) one-way functions with security based on 
the worst-case instances of problems pertaining to cyclic lattices (cyclic lattices 
are lattices that correspond to ideals in the ring Z[x]/(x” — 1)). This result was 
later extended to give constructions of collision-resistant hash functions by ei- 
ther restricting the domain or changing the ring in Micciancio’s scheme. 
These works then led to constructions and implementations of collision-resistant 
hash functions with security based on worst-case problems in lattices corre- 
sponding to ideals in Z[x]/(x" + 1) whose performance was comparable to the 
performance of ad-hoc hash functions that are currently in use today. And be- 
cause there is a very close connection between collision-resistant hash functions 
and more sophisticated primitives such as ID schemes and digital signatures, it 
was very natural to ask whether these primitives also had efficient lattice-based 
constructions. There has been some recent work in this direction, which we will 
now describe. 

Lyubashevsky and Micciancio constructed a one-time signature in which sign- 
ing and verification can be performed in time O(n) [IJ]. Using standard tech- 
niques, the one-time signature can be transformed into a full-fledged signature 
scheme using a signature-tree with only an additional work factor of 
O(log n). While this combination results in a very theoretically-appealing scheme 
where all the operations take time O(n), it does require the use of a tree, which 
is a somewhat unwanted feature in practice. Another signature scheme was pro- 
posed by Gentry et al. in R]. Their signature scheme follows the hash-and-sign 
paradigm, and when instantiated with algebraic lattices [B7], verification takes 
time O(n), but O(n*) time is needed to do the signing (it is plausible that the 
signing time could be reduced to O(n?) with a more careful analysis). 

A different way of constructing digital signature schemes is to first construct 
an identification scheme of a certain form and then convert it to a signature 
scheme using the Fiat-Shamir transform [MBM]. The identification schemes of 
Micciancio and Vadhan [26], Lyubashevsky [IZ], and Kawachi et al. can 
all be instantiated such that the secret and public keys are of size O(n), and 
the entire interaction takes O(n) time as well. While these constructions seem 
essentially optimal, they contain a common inefficiency. The ID schemes all 
have the form of standard commit-challenge-response protocols (see Figure [D 
for an example of one where Y is the commitment, c is the challenge, and z is 
the response), and the inefficiency lies in the fact that for each challenge bit, 
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the response consists of O(n) bits. Since the security of the protocol is directly 
connected to the number of challenge bits sent by the verifier, it means that for 
every bit of security, O(n) bits need to be transmitted. Theoretically, this does 
not cause a problem because one only needs w(log n) bits of security in order for 
the protocol to be considered secure against polynomial-time adversaries, and 
then the total running time of the above protocols is still O(n). But in practice, 
this is a rather unsatisfactory solution because one wants some concrete security 
guarantee, say 80 bits, and then the communication complexity of the ID scheme 
will be about 80 times larger (the size of the signature in the derived signature 
scheme would be 160 times larger) than possibly necessary. This is in sharp 
contrast to number-theoretic ID schemes where the response of the prover is 
longer than the challenge by only a small factor. 

What allows number-theoretic ID schemes like Schnorr [35], GQ [[3], Girault 
[10], Okamoto 27, etc. to be so “compact” is that the challenge string in these 
protocols is not treated as a sequence of independent 0’s and 1’s, but instead 
the entire string is interpreted as an integer from a certain domain. This can be 
done because there is a lot of underlying algebraic structure upon which these 
schemes are built. On the other hand, lattices do not seem to have as much 
algebraic underpinning, and so the schemes based on them are very combinatorial 
in nature which is why the challenge strings are treated simply as a sequence of 
independent challenges much like in generic zero-knowledge proofs for NP. The 
main accomplishment of the current work is to show how to exploit the limited 
algebraic structure of ideal lattices in order to use the challenge bits collectively 
rather than individually, which ends up greatly improving the practical efficiency 
of lattice-based identification and signature schemes. 


1.1 Contributions and Comparisons 


Lattice-based constructions. We construct a lattice-based ID scheme in 
which the challenge string is treated as a polynomial in a certain ring, and one 
correct response to it from the prover is enough for authentication. The caveat is 
that some constant fraction of the time, the prover cannot respond to the chal- 
lenge from the verifier and must abort the protocol. The result of this is that the 
“commit” and “challenge” steps of the ID scheme now must be repeated several 
times to ensure that a valid prover is accepted with some decent probability. 
But using standard techniques, one can significantly shorten the length of the 
“commit” part of the protocol, and because of the structure of our scheme, the 
challenge can always be the same. Therefore the number of transmitted bits is 
dominated by the length of the single “response”. 

Even more optimizations are possible when converting the ID scheme into 
a signature scheme using the Fiat-Shamir transform. In the resulting signature 
scheme, there is of course no longer any interaction until the signer outputs the 
signature. And therefore there is no need for the signer to output the attempts 
in which he failed to sign (which correspond to the times he couldn’t answer the 
challenge in the ID scheme). So while the failures do cost time, the length of 
the final signature is as short as it would have been if the signer only attempted 
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to sign once and succeeded. And because the probability of failure is a small 
constant (~ 2/3), we only expect to repeat the signature protocol 3 times before 
succeeding. 

All operations in our scheme take time O(n) and we prove that the ID and 
signature schemes are secure based on the worst-case hardness of approximating 
the shortest vector to within a factor of O(n) in lattices corresponding to ideals 
in the ring Z|x]/(x” + 1) (the security of the signature scheme is in the random 
oracle model). Compared to previous works, our asymptotic hardness assumption 
is the same as that in (although the scheme of [I9] is secure in the standard 
model), but is worse than that in (where the factor is O(n!*)) and in 
(where the factor is O(n)). 

Based on the work of Gama and Nguyen [$| who worked out the effectiveness 
of current state-of-the-art lattice reduction algorithms, we present some con- 
crete parameters with which our schemes can be instantiated. On the low end, 
the outputted signatures are about 50000 bits in length (the ID scheme requires 
about 65000 bits to be transmitted). While the scheme of has better asymp- 
totic security, the response to each challenge bit seems to require at least 10000 
bits. So if we would like the challenge to be 160 bits for security purposes, the 
response (and therefore the signature size) will be over a million bits. The signa- 
ture schemes of and [I] look like they would have their signatures be about 
160 times longer than ours (the ID schemes would require communications that 
are about 80 times longer), again because the responses are done separately to 
every challenge bit. So even though our ID and signature schemes have worse 
asymptotic security, their structure makes them much more practically efficient. 

At this point it is not possible to give an accurate comparison of our signa- 
ture scheme to the hash-and-sign signature schemes because no concrete 
parameters were given for those schemes. But independent of the signature sizes, 
our scheme will still have the advantage in that signing can be done in time O(n) 
rather than O(n*). 

The signature length of the one-time signature in may actually be a 
little shorter than in our scheme, but this advantage is lost when the one-time 
signature gets converted to a general stateless signature scheme. If a signature 
tree is used in the conversion, then the signature length may go up by a factor 
of the tree depth, which would make it much less efficient. On the other hand, 
one could build a hash tree using any collision-resistant hash function, and then 
the signatures would only increase by the product of the tree depth and the 
hash function output. If the scheme is to be completely stateless and support 
about 2° signatures, and we use SHA-256 as the hash function, then the length 
of the one-time signature in would increase by about 15,000 bits, which 
would make it somewhat longer than our signature. The similarity between the 
signature sizes of our scheme and the scheme in is no coincidence, and we 
further discuss the relationship between the two schemes in Section DA 


Factoring-based signatures. We show that the ideas used to construct our 
lattice-based digital signature can also be used for shortening the length of some 
number-theoretic schemes. The signature scheme originally proposed by Girault 
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[0], and analyzed in 0B] is a scheme whose security, in the random oracle 
model, is based on the hardness of factorization. What is particularly attrac- 
tive about it is that if the signer can do some pre-computing before receiving 
the message, then signing can be done with just one random oracle query, one 
multiplication, and one addition over the integers (no modular reduction is re- 
quired). We show how to reduce the length of the signature in an instantiation 
of the scheme due to Pointcheval BI] from 488 bits to 425. 


1.2 Techniques 


There is a pattern that emerges when looking at constructions of certain ID 
and signature schemes based on the hardness of factoring and discrete log. The 
informal chain of reductions from the hard problem to the signature scheme 
looks as follows: 


Hard Problem < CRHF < One-time signature < ID scheme < Signature 


For example, finding collisions in the hash function h(x) = g” mod N implies 
being able to factor N. This can be converted into a one-time signature with the 
secret key being some pair of integers (x,y), public keys being h(a), h(y), and 
the signature of a message c being xc + y. The one-time signature can then be 
converted into an ID scheme by simply picking a new y every time (Figure I) 
and c now being a challenge chosen by the verifier. The ID scheme can then 
be converted to a signature scheme by using the Fiat-Shamir transform which 
replaces the verifier with a random oracle (Figure D) [OB]. The same idea 
can be used with the hash function h(a1, 22) = (g{'g5? mod p), in which finding 
collisions implies solving the discrete log problem. The ID and signature schemes 
resulting from that hash function are due to Okamoto [27]. 

It turns out that a somewhat similar approach can be used to build lattice- 
based primitives as well. The works of [29§18], showed a reduction from the worst- 
case problem of finding short vectors in algebraic lattices to finding collisions in 
hash functions. The work of can be viewed as a transformation of the hash 
function to a one-time signature, and this current work can then be seen as the 
continuation of this chain of reductions where the one-time signature of is 
converted into an ID scheme and then into a signature scheme. 

But what prevents the techniques used in number-theoretic schemes to be 
directly extended to lattice-based ones, is that lattices allow for much less alge- 
braic structure. For example, the domains in number-theoretic hash functions are 
rings, while in lattice-based ones, the domain is just a subset of a ring (in partic- 
ular, those elements in the ring that have small Euclidean norm) that is neither 
closed under addition nor multiplication. This is very related to the fact that 
the factoring and discrete log problems can be reduced to finding an element in 
the kernel of some homomorphic function, while finding short vectors in lattices 
reduces to the problem of finding small elements in the kernel of a homomor- 
phism. This difference is what seems to give lattice problems resistance against 
polynomial-time quantum algorithms that solve factoring and discrete log BG, 
but at the same time it also hinders constructions of lattice-based primitives. 
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Secret key: s = Dg 

Public key: N, g, and S — g* mod N 
Prover Verifier 
y È Dy, Y — g” mod N 


a 


$ 
cH De 


Ze sety 
lif z ¢ G then z —L | aa 


Accept iff g% = Y S°(modN) 
Fig. 1. Factoring-Based Identification Schemes. The parameters for this scheme 
are in Figure[§] The line in [ ] is only performed in the aborting version of the scheme. 


In overcoming this limitation, the one-time signature of had to leak parts 
of its secret key. While it wasn’t a problem in that setting because the secret key 
is only used once, in ID schemes the same secret key is used over and over, and 
so leaking a part of the secret key every time would result in complete insecurity. 
In this paper, we solve this difficulty by using an aborting technique that was 
introduced in [7]. The idea behind aborting is that the prover can elect to abort 
the protocol in order to protect some information about his secret key (mainly, 
the protocol needs to remain witness-indistinguishable). In this work, we are 
able to relax the conditions that were needed for witness-indistinguishability in 
[L7], and this allows us to construct much more efficient lattice-based protocols 
as well as extend the technique to other contexts, such as the factoring-based 
scheme described in Section LI] We essentially show that all that is needed for 
the aborting technique to be applicable is a collision-resistant homomorphic hash 
function that has small elements in its kernel. We believe that this technique can 
find further applications. 


1.3 Intuition for Aborting 


Understanding where aborting might be useful is best accomplished with an 
example. Consider the protocol in Figure[]] (for this discussion, it is not necessary 
to understand why the protocol works), which has the form of a typical 3-round 
commit-challenge-response ID scheme. The secret key is some s and the public 
key is h(s) where h is a function that happens to be h(s) = g* mod N in our 
example. In the first step of the protocol, the prover picks a parameter y, and 
sends h(y), to the verifier. The verifier picks a random “challenge” c, and sends 
it to the prover. The third step of the protocol consists of a response of the 
prover to the challenge. This response must somehow use the secret key, and in 
our example, the secret key s is multiplied by c and then added to y. Notice that 
sending sc without adding it to y would completely reveal s, and so the job of 
y is to somehow mask the exact value of sc. If the operation sc takes place in 
some finite group, then a natural idea for masking would be to pick y uniformly 
at random from that group. The intuition is that if nothing about y is known, 
then the value y+ sc is also completely random (of course, something is known 
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about y when the prover sends h(y) to the verifier, but we gloss over that here). 
And this is exactly what is done in well-known ID schemes such as Schnorr B5l, 
GQ [13], Okamoto 27], etc.. 

But sometimes it is infeasible to pick y uniformly at random from the group. 
In Girault’s ID-scheme (Figure []), the multiplication sc is performed 
over the integers, which is an infinite group. A way to do masking in this scheme 
is to pick a y in a range that is much larger than the range of sc. So for example, 
if 0 < sc < R, then one could pick a random y from the range (0, ..., 264R]. Then, 
with very high probability, the value of sc + y will be in [R,...,2°*R], in which 
case it will be impossible to determine anything about sc if nothing is known 
about y. 

In constructing our lattice-based ID scheme, the same difficulty is encoun- 
tered as in Girault’s scheme, except we do not have the luxury of picking y 
(or something analogous to y in the lattice-based scheme) from such a large 
range because doing so would require us to make a much stronger complexity 
assumption which would significantly decrease the efficiency of the protocol (we 
would have to assume that it is hard to find a super-polynomial approximation 
of the shortest vector instead of just an O(n?) approximation). Our solution is 
to instead pick y from a much smaller set, something analogous to [0,...,2R], 
but only reveal sc + y if it falls into the range [R,...,2R]. If the range is picked 
carefully and the function h is a homomorphism that has “small” elements in 
its kernel, then one can show that if the prover only reveals values in this range 
and aborts otherwise, the protocol will be perfectly witness-indistinguishable. 
The witness-indistinguishability is then used to prove security of the protocol by 
showing that a forger can be used for extracting collisions in h. 

The same technique can also be applied to Girault’s scheme. Notice that 
if we pick y uniformly at random from the range [0,...,2R] instead of from 
[0, ..., 264R], the length of sc + y will be 63 bits shorter. We point out that our 
aborting factoring-based ID scheme in Figure] which uses this idea is actually 
worse than the corresponding non-aborting one because the savings gained by 
shortening sc+ y are lost in case the prover has to abort and the ID protocol 
has to be repeated. But the advantage of aborting does show itself when the ID 
protocol is converted into a signature scheme using the Fiat-Shamir paradigm 
(Figure D). In a signature scheme, there is no interaction, and therefore there 
is no need for the signer to ever include the aborted signing attempts into the 
final signature. So if the signer needs to abort, he simply reruns the protocol 
until he gets a signature in the correct range. The end result is that the eventual 
signature is shorter than it would have been in schemes such as where 
the signer does not have the option to abort. 


2 Preliminaries 


2.1 Notation 


We will denote vectors by bold letters. For convenience, vectors of vectors will 
be denoted by a bold letter with a hat. For example, if a,,a2 are elements of 
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Z”, then we can write a = (a),a2). The Zœ norm of a is written as |/al|,., and 
\|Al|.o for @ = (a1,...,am) is defined as max;(|jailloo). If S is a set, then a È S 
means that a is chosen uniformly at random from S. All logarithms are assumed 
to be base 2. 


2.2 Lattices and Algebra 


An integer lattice A is a subgroup of Z”. The approximate Shortest Vector 
Problem (SVP.,(A)) asks to find a vector v in A such that ||v||.o is no more than 
y times larger than the vector in A with the smallest Z% norm. In this work, we 
will be interested in lattices that exhibit an additional algebraic property — in 
particular, they correspond to ideals in the ring Z[x]/(x” + 1). We will say that 
a lattice A is an (x” + 1)-cyclic lattice if for every vector (vp,.--,Un—2,Un—1) € 
A, the vector (—vpn_—1,V0,---;Un—2) is also in A. If we look at the vectors as 
polynomials (i.e. (vo, ..., Un—2) Un—1) aS Vo +... + Un—2X”7? + Un_1x"1), then 
an (x” + 1)-cyclic lattice is an ideal in Z[x]/(x" + 1) because in this ring, 


1 


n= X = —Un—1 HVX +... + Un—2x” `. 


ae nie) i 


(vo +... + Un—2X 
The ring that will be most important to us throughout the paper is the ring 
Zp[x]/(x” +1) where p is some odd positive integer. The elements in Z,[x]/(x"+ 
1) will be represented by polynomials of degree n — 1 having coefficients in 
the range —_ z=]. Throughout the paper, we will treat polynomials in 
Zp[x]/(x” + 1) and vectors in Z” as the same data type. So when, for example, 
we talk of multiplying two vectors, we actually mean converting the vectors to 
polynomials and then multiplying the polynomials in Z,[x]/(x” + 1). Similarly, 
the nornt] of a polynomial is just the norm of the corresponding vector. It’s not 
hard to see that for polynomials v,w € Z,[x]/(x” +1), the following relation 


holds: 


Ivwllæ < [Iv llooll Il, < 7II¥ lool Wlloo 


(x” + 1)-cyclic lattices are a particular class of lattices that received attention 
because one can construct efficient and provably secure cryptographic primitives 
based on the hardness of finding approximate short vectors in these lattices 
BEIDE]. The main reason for this efficiency is that the multiplication of 
two polynomials in Z,[x]/(x" + 1) can be done in time O(n) using the Fast 
Fourier Transform. While the results in this paper can be applied to lattices 
that correspond to ideals in other rings, it would only unnecessarily complicate 
matters because the ring Z[x]/(x" +1) seems to be the most useful theoretically 
and in practice. 

While a lot is known about the complexity of SVP in general lattices, very 
little is known about this problem when restricted to ideal lattices. Nevertheless, 
the problem is related to some problems in algebraic number theory (see [[830)) 


1 This is a slight abuse of the word norm. Because of the reduction modulo p, it’s not 
true that for any integer a we have ||aal|o. = |al||al|oo, but it still holds true that 
lla + Blloo < llall + ||bl]o0 and |laalloo < |a|lalloo- 
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that do not have any efficient solution. And it seems that the currently best 
lattice algorithms are unable to take advantage of the extra structure provided 
by ideal lattices. Therefore, it still seems that solving SVP, takes time JOm) 
when y = n° [GH]. 


2.3 Lattice-Based Collision-Resistant Hash Function 


Let R be the ring Z,[x]/(x"+1). We define the following family of hash functions: 


Definition 1. For any integer m and D C R, the function family 
H(R,D,m) mapping D™ to R is defined as 


H(R,D,m) = {ha : à € R™}, where for any Z € D”, ha(zZ)=a-Z 


That is, ifa = (aj,...,am) and Z = (Z1,...,Zm), then ha(Z) = ajzi+...tamZm 
where all the operations are performed in the ring Z,[x]/(x" + 1). It’s not hard 
to see that the hash functions in H(R, D, m) satisfy the following two properties 
for any y,z2¢€ R™ and c € R: 


h(F +2) = h(y) + AZ) (1) 


h(ye) = h(y)e (2) 
The collision problem Col(h, D) is defined as follows: 


Definition 2. Given an element h € H(R,D,m), the collision problem 
Col(h,D), where D C R, asks to find two distinct elements Z,Z' € D such 
that h(Z) = A(Z’). 


In [8], it was shown that when D is some restricted domain, solving the 
Col(h, D) problem for random h € H(R,D,m) is as hard as solving SVP, 
for any (x” + 1)-cyclic lattice. 


Theorem 1. Let R = Z,[x]/(x" +1) be a ring where n is any power of 2, and 
define D = {y € R: |ly|loo < d} for some integer d. Let H(R,D,m) be a hash 
function family as in Definition] such that m > wee, and p > 4dmn}* logn. 
If there is a polynomial-time algorithm that solves the Col(h,D) problem for 
a random h € H(R,D,m) with some non-negligible probability, then there is 
a polynomial-time algorithm that can solve SVP, (A) for every (x” + 1)-cyclic 


lattice A, where y = 16dmn log? n. 


2.4 Cryptographic Definitions 


Digital Signatures. We recall the definitions of signature schemes and what 
it means for a signature scheme to be secure. 
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Definition 3. A signature scheme consists of a triplet of polynomial-time (pos- 
sibly probabilistic) algorithms (G,S,V) such that for every pair of outputs (s,v) 
of G(1") and any n-bit message m, 


Pr{V(v,m, S(s,m)) = 1) =1 
where the probability is taken over the randomness of algorithms S and V. 


In the above definition, G is called the key-generation algorithm, S is the signing 
algorithm, V is the verification algorithm, and s and v are, respectively, the 
signing and verification keys. 

A signature scheme is said to be secure if there is only a negligible probability 
that any forger, after seeing signatures of messages of his choosing, can sign a 
message whose signature he has not already seen [A]. 


Definition 4. A signature scheme (G,S,V) is said to be secure if for every 
polynomial-time (possibly randomized) forger F, the probability that after seeing 
the public key and {(u1, S(s, p1)), ---, (Hq, O(S, Hq))} for any q messages ui of 
its choosing (where q is polynomial in n), F can produce (u # pi,o0) such that 
V(v,u,o) =1, is negligibly small. The probability is taken over the randomness 
of G, S, V, and F. 


In the standard security definition of a signature scheme, the forger should not be 
able to produce a signature of a new message. A stronger notion of security, called 
strong unforgeability requires that in addition to the above, a forger shouldn’t 
even be able to come up with a different signature for a message whose signature 
he has already seen. The schemes presented in this paper satisfy this stronger 
notion of unforgeability. 


Identification Schemes. An identification scheme consists of a key-generation 
algorithm and a description of an interactive protocol between a prover, pos- 
sessing the secret key, and verifier possessing the corresponding public key. In 
general, it is required that the verifier accepts the interaction with a prover who 
behaves honestly with probability one, but this definition can be relaxed so that 
sometimes an honest prover is not accepted with some small probability. 

The standard active attack model against identification schemes proceeds in 
two phases [5]. In the first phase, the adversary interacts with the prover in an 
effort to obtain some information. In the second stage, the adversary plays the 
role of the prover and tries to make a verifier accept the interaction. We remark 
that in the second stage, the adversary no longer has access to the honest prover. 
The adversary succeeds if he is able to make an honest verifier accept with some 
non-negligible probability. 


Witness-Indistinguishability. We will only define the concept of witness- 
indistinguishability in a way that pertains to our application and we refer the 
reader to for the more general definition. For convenience, we will use the 
notation from the identification protocol in Figure [] An identification scheme 
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is said to be perfectly witness-indistinguishable if for any public key S, and any 
two valid secret keys s,s’ (i.e. s,s’ € D, and gê mod N = g* mod N = S), the 
view of any (possibly malicious) verifier in the interaction where the prover uses 
s has the exact same distribution as the view where the prover uses s’. In other 
words, it is impossible for the verifier to figure out which of the valid secret keys 
the prover is using to authenticate himself. 


3 Lattice-Based Constructions 


In this section, we present our lattice-based identification (Figure B) and sig- 
nature (Figure H) schemes. In Figure B] we define all the parameters that will 
appear in this section as well as give some concrete instantiations. The parame- 
ter « controls the size of the domain from which the challenges/signatures come 
from. In order to have soundness error of at most 278°, this parameter must be 
set such that the size of this domain is 216°, The parameter p is chosen such that 
every public key has a very high probability of having multiple corresponding 
secret keys associated with it. The free parameters n,m, and o need to be set 
in a way so that it is computationally infeasible find collisions in the underlying 
hash function family H(R, D, m). 

The last two lines of the above table deal with the practical cryptanalysis 
of our signature scheme. The last line of the table specifies the length of the 
shortest vector in a certain lattice defined by our signature scheme that can be 
found in practice, while the line above that specifies the length of the vector that 
needs to be found in order to forge a signature. See Section B-3]for more details. 
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Fig. 2. Lattice-Based Schemes’ Parameter Definitions and Sample Instantiations 
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3.1 Identification Scheme 


The secret key of the prover, denoted S, consists of a set of m polynomials 
from the set D, which are picked uniformly at random. The public key of the 
prover consists of a hash function h which is picked randomly from the family 
H(R, D,m), and the polynomial S = h(S). We point out that it is not necessary 
for every prover to have a distinct h. If trusted randomness is available, then 
everyone can share the same random h which considerably lowers the public 
key size because the hash function h can be hard-coded into the signing and 
verification algorithms. 

In the first step of the protocol, the prover picks a random y € Dy’, and 
“commits” to it by sending Y = h(y) to the verifier. The verifier then picks 
a random challenge c from De and sends it to the prover. The prover then 
computes Z = Sc + y. If this result falls into the range G”, the prover sends it 
to the verifier. Otherwise, he aborts the protocol. Upon receiving Z, the verifier 
accepts the interaction if Z € G™ and h(z) = Sc + Y. Using the homomorphic 
properties of h (see (I) and @)), we see that h(Sc + y) = Sc + Y, and so an 
honest prover who does not abort will always be accepted. 

Proving the soundness and completeness of the protocol is done using the 
following series of steps: 


1. Show that an honest prover is accepted with probability 1/e. 

2. Show that the ID scheme is perfectly witness-indistinguishable. 

3. Show that with probability 1 — 271?8, for a randomly-picked § € D”, there 
is another s’ € D” such that h(s) = h(s’). 

4. Show how to extract a collision in h from an adversary who succeeds in 
breaking the protocol 


Step 1 shows that the completeness of the protocol is 1/e. We will explain 
how to increase this number later. Step 2 is essentially the main part of the 
proof, which shows that for every pair of possible secret keys 8,8’ such that 
S = h(S) = h(s’), no adversarial verifier can determine which secret key is being 
used by the prover. The reason for this is that we have set up the parameters 
so that for every secret key S € DY, every challenge c € De, and every response 
Z eG", the value of Z — Sc is in D}. This implies that having seen the history 
(Y,c,Z), it is impossible to tell whether the secret key was $ and we picked 
a masking parameter y, or the secret key was 8’ and we picked the masking 
parameter y’ = Z—S’c = y + Sc — S'e = y+ (S — S')c because h(S) = h(s’) = S 
and h(y) = h(y’) = Y. 

To make the claim in step 2 non-vacuous, we need to show that for a randomly 
picked secret key, there is indeed a high probability that another secret key exists 
which produces the same public key. This is done in step 3. 

In step 4, we show how to use a successful adversary to solve the Col(h, D) 
problem for a random h € H(R, D, m). Given a random h € H(R, D,m), we pick 
a random secret key $ and publish the public keys h and S = A(S). In the first 
stage of the attack, the adversary plays the role of the verifier, and we are able 
to perfectly play the part of the prover because we know the secret key. In the 
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Private key: $< D” 
Public key: h © H(R, D,m),S — h(S) 


Prover Verifier 
FED Yeuy) —_* 

c cÈ D: 
Z—set+y 
if 2 ¢ G™ then ZL Z 


Accept iff Z € G™ and h(z) = Sc + Y 


Fig. 3. Lattice-Based Identification Scheme 


second stage when the adversary attempts to impersonate the prover, we receive 
his commitment, and send a random challenge c € De. After he responds with Z, 
we rewind and pick another random challenge c’ € De, to which the adversary 
will respond with Z’. The responses of the adversary and our knowledge of the 
secret key allow us to obtain the equation A(Z — Sc) = h(z’ — sc’). By our choice 
of parameters, both Z — Sc and 2’ — Sc’ are in D, and because of the witness- 
indistinguishability of the protocol, the adversary cannot know our exact secret 
key. Therefore with probability at least 1/2, Z — Sc and Z’ — 8c’ will be distinct 
and we have a collision for h. Thus an adversary who can break the ID scheme 
can be used to solve Col(h, D) for random h € H(R,D,m), and by Theorem [N 
this implies finding the approximate short vector in all (x” + 1)-cyclic lattices. 


Theorem 2. If the identification scheme in Figure is insecure against active 
attacks for the parameters in Table A then there is polynomial-time algorithm 
that can solve SVP.,(A) for y = O(n?) for every lattice A corresponding to an 
ideal in the ring Z[x|/(x" +1). 


Notice that the ID scheme is not quite satisfactory because a valid prover is 
only accepted with probability 1/e. This means that the scheme may have to 
be repeated several times until the prover succeeds. Because we showed that the 
scheme is witness-indistinguishable, the repetitions can be performed in parallel, 
and the witness-indistinguishability property will still be preserved [6]. So the 
straight-forward way to modify the ID scheme would be, for example, to pick 30 
different y;’s and send the Y; = h(¥y;) to the verifier. Then the verifier will send 
30 challenges, and the prover replies to the first one of these challenges that he 
can. This would result in a protocol where the honest prover is accepted with 
probability about 1 — 277°. 

But there are some significant improvements that can be made. First of all, 
the verifier needs to send only one challenge, rather than one challenge for every 
commitment (this is because we show that for every challenge c, the probability 
of aborting is equal over the random choice of y). And secondly, we can use a 
standard trick to shorten the length of every Y;, which will result in large savings 
in our protocol because the length of each Y is approximately n log p bits, which 
could be as large as 100,000 bits! Instead of sending Y, we can send H(Y) where 
H is any collision resistant hash function. Unlike with h, we will not need H to 
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Signing Key: § © D™ 
Verification Key: h © H(R, D, m), S — h(8) 
Random Oracle: H : {0,1}* —> De 


Sign(u, h, 8) Verify (u, Z, e, h, S) 

1: F< DY 1: Accept iff 

2: e — H(A(¥), p) Z € G” and e = H(h(Z) — Se, p) 
3: Z Se +y 

4: if Z ¢ G”, then goto step 1 

5: output (Z, e) 


Fig. 4. Lattice-Based Signature Scheme 


have any algebraic properties like (J) and (B, so H could be a cryptographic 
hash function such as SHA or an efficient lattice-based hash function from 
whose output is about 512 bits. So sending 30 H(Y)’s will only require about 
15,000 bits in total. In this modified protocol, the verifier’s challenge and the 
prover’s reply remain the same as in the old protocol. But to authenticate the 
prover, the verifier checks whether Z € G™ and that H(h(z) — Sc) is equal 
to some H(Y) sent by the prover in the first step A. It can be shown that an 
adversary who breaks this protocol can be used to find a collision either in H or 
in h. We will give more details in the full version of the paper. 


3.2 Signature Scheme 


Our signature scheme is presented in Figure] The public and secret keys are just 
like in the ID scheme. To sign a message u, we pick a random y and compute 
e = H(h(y),) and send (Z,e) as the signature only if Z is in the set G”. 
Otherwise we repeat the procedure until Z ends up in G™. The probability that 
we succeed in getting Z to be in G™ on any particular try is the same as the 
probability that the ID scheme in Figure B]doesn’t send L, which is 1/e. So we 
expect to repeat the signing procedure less than 3 times to get a signature. 

The witness-indistinguishability of the signature scheme follows directly from 
the witness indistinguishability of the ID scheme because the challenge is now 
simply generated by a random oracle rather than by the verifier. The proof 
of security of the signature scheme uses the forking lemma [BJ] to obtain two 
signatures from a forger that use the same random oracle query. Then using the 
same ideas as in the security proof of the ID scheme, it can be shown how to 
use these signatures to obtain a solution to the Col(h, D) problem for a random 
he H(R, D,m). 


Theorem 3. If the signature scheme in Figure [A] for the parameters in Table A 
is not strongly unforgeable, then there is a polynomial-time algorithm that can 
solve SVP, (A) for y = O(n?) for every lattice A corresponding to an ideal in 
the ring Z[x|/(x" + 1). 


? One could lower the communication complexity even further by combining the 30 
hashes into a hash tree. 
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3.3 Concrete Parameters 


The security of our ID (and signature) scheme depends on its soundness and 
the hardness of finding collisions in hash functions from a certain family. As 
mentioned earlier, we set the parameters « and p such that the soundness error 
is at most 278°, We now discuss how to set the remaining parameters so that 
finding collisions in the resulting hash function is infeasible with the techniques 
known today. For this, we will use the work of [B], who showed that, given 
a reasonable amount of time, algorithms for finding short vectors in random 
lattices produce a vector that is no smaller than 1.01” times the shortest vector 
of the lattice. 

We showed that an adversary who succeeds in forging a signature can be 
used to find a collision in a hash function chosen randomly from H(R,D,m). 
This is equivalent to finding “short” vectors a certain lattice which we will now 
define. For a polynomial a € Z,[x]/(x” + 1), let Rot(a) be the n x n matrix 
whose it column is the polynomial ax’, and let A be the n x nm matrix A = 
[Rot(a1)||Rot(az)||...||Rot(am)] where a; are the polynomials which define the 
hash function h. If we define the lattice A} (A) = {u € Z™” : Au = 0(mod p)}, 
then finding a vector u € A} (A) whose læ norm is at most 2mncox is equivalent 
to finding a collision in h € H(R, D, m). 

The random lattices on which the experiments of were run differ from 
A; (A), but in B5], experiments were run on lattices that are very similar 
to A>(A) which obtained the same results as [B]. Furthermore, it was shown 
in that it is inefficient to try to find a short vector in Ay (A) by using 


all its mn dimensions. Rather, one should only use the first ,/nlog p/ log 1.01 
dimensions and zero out the others. This results in a vector whose £z length is 
min{p, 22V” logplog 1.01) and whose læ norm is at least 


min{p, g2Vnlog plog 1-01 , (nlog p/ log 1.01)7 1/4} (3) 


Since solving the Col(h, D) problem is equivalent to finding an element y such 
that h(y¥) = 0 and ||¥|lo < 2mnon, we want to make sure that when we set 
our parameters, the value of 2mnor is smaller than the value in @). In the 
instantiation of the scheme that produces a signature of length approximately 
49000 bits, the value of 2mnox is around 2?3-5, while the value of the shortest 
vector (in the Zæ norm) that can be found according to (B) is around 2?°-° (see 
the last two lines of the table in Figure JJ). 

We hope that our work provides further motivation for studying lattice- 
reduction algorithms for lattices of the form A; (A), which also happen to be 
central to the cryptanalysis of other lattice-based schemes such as 2O§T9IT5). 


3 The lattices in were just like Ap (A), except each entry of A was chosen uniformly 
at random modulo p. Since the currently best lattice-reduction algorithms don’t 
“see” the algebraic structure of the lattice, it is very reasonable to assume that their 
performance will be the same on our lattices and the lattices in [Z5]. Of course it’s 
possible that a different algorithm that has yet to be discovered will be able to use 
the algebraic structure of A to achieve better results. 
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4 Factoring-Based Constructions 


We now present a modification of a signature scheme presented in [BI] whose 
security is based on the hardness of factoring . We will need the following two 
definitions from BI. 


Definition 5. A prime p is said to be a-strong if p = 2r+1 where r is an 
integer whose prime factors are all greater than a. 


Definition 6. Let N = pq, where p and q are primes. Then an element g € Zi, 
is said to be an asymmetric basis if the parity of ord(g) in Z% differs from the 
parity of ord(g) in Zi. 


Both schemes are presented in Figure D] (our scheme only differs from that in 
[31] by the addition of line 4), and the parameters in [BJ] as well as our modified 
parameters are presented in Figure [E] We point out that the scheme of BI is a 
variant of Girault’s scheme [I0], and our technique of shortening the signature 
length would apply equally well to all its variants [OBL] as well as to the 
blind signature constructed in [BY]. 

The signature of a message u consists of the pair (z,e). The length of z in 
the non-aborting version of the protocol has length k + k’ + logo = 360, while 
in our protocol the length is k + 1+ logo = 297. The savings are essentially due 
to the fact that we can pick y in a much smaller range, and the fact that we are 
allowed to abort keeps the scheme secure. 

If in step 4, z is not in G, then the signing procedure has to be repeated. 
It can be shown that this happens with probability 1/2. So we expect to run 
the signing protocol twice for every signature. But if we assume that off-line 
computations (i.e. computations before receiving the message) are free, then we 
can change the protocol so that we expect to compute just one extra random 
oracle query over the non-aborting signature scheme. The way to do this is to 
always keep several y; and g”? mod N stored along with the ranges that e would 
have to fall into so that se +y; € G (the range is just (G — y;)/s). Then when we 
are asked to sign a message u, we compute e = H(g”! mod N, p) and then check 


Secret Key: s fa Ds 

Public Key: N, g, and S — g* mod N 

Random Oracle: H : {0,1}* — De 

Verify (11, Z, e, N, 9, S) 


Sign(u, N, g,s 
8 (u, 9,8) 1: Accept iff e = H(g*S ° mod N, u) 


: yo Dy 

e — H(g’ mod N, pz) 
zZ<—sety 

lif z ¢ G, then goto step 1] 
output (z,e) 


Fig. 5. Factoring-Based Signature Schemes. Line 4 is only executed in the abort- 
ing scheme. 
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l Aborting With Aborting 


1024-bit product + two 2"-strong primes 


asymmetric basis in Zy such that ord(g) has 160 bits 


AE Size (bits) 


Fig. 6. Factoring-Based Scheme’s Variable Definitions 


whether it’s in the valid range of yı. If it is, then we compute sc + yı and output 
it. If it’s not, then we recompute e using y2, and so on. The important thing to 
note is that we only compute sc + y; once, and we still expect to succeed after 
two tries. As an added bonus, we only use up one y; per message, since the y; 
that “didn’t work” can be safely tried for the next message. 


Theorem 4. An adversary who breaks the aborting signature scheme in T steps 
can be used to factor N in poly(T) steps. 
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Abstract. We describe public key encryption schemes with security 
provably based on the worst case hardness of the approximate Shortest 
Vector Problem in some structured lattices, called ideal lattices. Under 
the assumption that the latter is exponentially hard to solve even with a 
quantum computer, we achieve CPA-security against subexponential at- 
tacks, with (quasi-)optimal asymptotic performance: if n is the security 
parameter, both keys are of bit-length O(n) and the amortized costs of 
both encryption and decryption are O(1) per message bit. Our construc- 
tion adapts the trapdoor one-way function of Gentry et al. (STOC’08), 
based on the Learning With Errors problem, to structured lattices. Our 
main technical tools are an adaptation of Ajtai’s trapdoor key genera- 
tion algorithm (ICALP’99) and a re-interpretation of Regev’s quantum 
reduction between the Bounded Distance Decoding problem and sam- 
pling short lattice vectors. 


1 Introduction 


Lattice-based cryptography has been rapidly developing in the last few years, in- 
spired by the breakthrough result of Ajtai in 1996 [I], who constructed a one-way 
function with average-case security provably related to the worst-case complexity 
of hard lattice problems. The attractiveness of lattice-based cryptography stems 
from its provable security guarantees, well studied theoretical underpinnings, 
simplicity and potential efficiency (Ajtai’s one-way function is a matrix-vector 
multiplication over a small finite field), and also the apparent security against 
quantum attacks. The main complexity assumption is the hardness of approxi- 
mate versions of the Shortest Vector Problem (SVP). The GapSVP.,,,,) problem 
consists in, given a lattice of dimension n and a scalar d, replying YES if there 
exists a non-zero lattice vector of norm < d and NO if all non-zero lattice vectors 
have norm > y(n)d. The complexity of GapSVP.,,,,) increases with n, but de- 
creases with y(n). Although the latter is believed to be exponential in n for any 
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polynomial y(n), minimizing the degree of y(n) is very important in practice, to 
allow the use of a practical dimension n for a given security level. 

LATTICE-BASED PUBLIC KEY ENCRYPTION. The first provably secure lattice- 
based cryptosystem was proposed by Ajtai and Dwork [B], and relied on a variant 
of GapSVP in arbitrary lattices (it is now known to also rely on GapSVP [19). 
Subsequent works proposed more efficient alternatives [33/30/9928]. The current 
state of the art is a scheme with public/private key length O(n?) and 
encryption/decryption throughput of O(n) bit operations per message bit. Its 
security relies on the quantum worst-case hardness of GapSVP 6,121.5) in arbi- 
trary lattices. The security can be de-quantumized at the expense of both in- 
creasing y(n) and decreasing the efficiency, or relying on a new and less studied 
problem [28]. In parallel to the provably secure schemes, there have also been 
heuristic proposals [ITJ]. In particular, unlike the above schemes which use 
unstructured random lattices, the NTRU encryption scheme [2] exploits the 
properties of structured lattices to achieve high efficiency with respect to key 
length (O(n) bits) and encryption/decryption cost (O(1) bit operation per mes- 
sage bit). Unfortunately, its security remains heuristic and it was an important 
open challenge to provide a provably secure scheme with comparable efficiency. 


PROVABLY SECURE SCHEMES FROM IDEAL LATTICES. Micciancio intro- 
duced the class of structured cyclic lattices, which correspond to ideals in poly- 
nomial rings Z[æz]/(x” — 1), and presented the first provably secure one-way 
function based on the worst-case hardness of the restriction of Poly(n)-SVP to 
cyclic lattices. (The problem 7-SVP consists in computing a non-zero vector of 
a given lattice, whose norm is no more than y times larger than the norm of 
a shortest non-zero lattice vector.) At the same time, thanks to its algebraic 
structure, this one-way function enjoys high efficiency comparable to the NTRU 
scheme (O(n) evaluation time and storage cost). Subsequently, Lyubashevsky 
and Micciancio [[7] and independently Peikert and Rosen showed how to 
modify Micciancio’s function to construct an efficient and provably secure colli- 
sion resistant hash function. For this, they introduced the more general class of 
ideal lattices, which correspond to ideals in polynomial rings Z[a]/ f(a). The col- 
lision resistance relies on the hardness of the restriction of Poly(n)-SVP to ideal 
lattices (called Poly(n)-Ideal-SVP). The average-case collision-finding problem 
is a natural computational problem called Ideal-SIS, which has been shown to 
be as hard as the worst-case instances of Ideal-SVP. Provably secure efficient 
signature schemes from ideal lattices have also been proposed [SESILIA], but 
constructing efficient provably secure public key encryption from ideal lattices 
was an interesting open problem. 


OUR RESULTS. We describe the first provably CPA-secure public key encryp- 
tion scheme whose security relies on the hardness of the worst-case instances of 
O(n?)-Ideal-SVP against subexponential quantum attacks. It achieves asymp- 
totically optimal efficiency: the public/private key length is O(n) bits and the 
amortized encryption/decryption cost is O(1) bit operations per message bit 
(encrypting Q(n) bits at once, at a O(n) cost). Our security assumption is 
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that O(n?)-Ideal-SVP cannot be solved by any subexponential time quantum 
algorithm, which is reasonable given the state-of-the art lattice algorithms [80]. 
Note that this is stronger than standard public key cryptography security as- 
sumptions. On the other hand, contrary to most of public key cryptography, 
lattice-based cryptography allows security against subexponential quantum at- 
tacks. Our main technical tool is a re-interpretation of Regev’s quantum reduc- 
tion between the Bounded Distance Decoding problem (BDD) and sampling 
short lattice vectors. Also, by adapting Ajtai’s trapdoor generation algorithm 
(or more precisely its recent improvement by Alwen and Peikert [5]) to structured 
ideal lattices, we are able to construct efficient provably secure trapdoor sig- 
natures, [D-based identification schemes, CCA-secure encryption and ID-based 
encryption. We think these techniques are very likely to find further applications. 

Most of the cryptosystems based on general lattices rely on 
the average-case hardness of the Learning With Errors (LWE) problem intro- 
duced in B3]. Our scheme is based on a structured variant of LWE, that we 
call Ideal-LWE. We introduce novel techniques to circumvent two main difficul- 
ties that arise from the restriction to ideal lattices. Firstly, the previous cryp- 
tosystems based on unstructured lattices all make use of Regev’s worst-case to 
average-case classical reduction from BDD to LWE (this is the classical step 
in the quantum reduction of [83] from SVP to LWE). This reduction exploits 
the unstructured-ness of the considered lattices, and does not seem to carry over 
to the structured lattices involved in Ideal-LWE. In particular, the probabilistic 
independence of the rows of the LWE matrices allows to consider a single row 
in Cor. 3.10]. Secondly, the other ingredient used in previous cryptosystems, 
namely Regev’s reduction from the computational variant of LWE to its 
decisional variant, also seems to fail for Ideal-LWE: it relies on the probabilistic 
independence of the columns of the LWE matrices. 

Our solution to the above difficulties avoids the classical step of the reduc- 
tion from altogether. Instead, we use the quantum step to construct a new 
quantum average-case reduction from SIS (the unstructured variant of Ideal-SIS) 
to LWE. It also works from Ideal-SIS to Ideal-LWE. Combined with the known 
reduction from worst-case Ideal-SVP to average-case Ideal-SIS [I7], we obtain a 
quantum reduction from Ideal-SVP to Ideal-LWE. This shows the hardness of 
the computational variant of Ideal-LWE. Because we do not obtain the hardness 
of the decisional variant, we use a generic hardcore function to derive pseudoran- 
dom bits for encryption. This is why we need to assume the exponential hardness 
of SVP. The encryption scheme follows as an adaptation of [9] Sec. 7.1]. 

The main idea of our new quantum reduction from Ideal-SIS to Ideal-LWE is 
a re-interpretation of Regev’s quantum step in [33]. The latter was presented as 
a worst-case quantum reduction from sampling short lattice vectors in a lattice L 
to solving BDD in the dual lattice L. We observe that this reduction is actually 
stronger: it is an average-case reduction which works given an oracle for BDD in L 
with a normally distributed error vector. Also, as pointed out in [9], LWE can be 
seen as a BDD with a normally distributed error in a certain lattice whose dual 
is essentially the SIS lattice. This leads to our SIS to LWE reduction. Finally 
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we show how to apply it to reduce Ideal-SIS to Ideal-LWE — this involves a 
probabilistic lower bound for the minimum of the Ideal-LWE lattice. We believe 
our new SIS to LWE reduction is of independent interest. Along with [22], it 
provides an alternative to Regev’s quantum reduction from GapSVP to LWE. 
Ours is weaker because the derived GapSVP factor increases with the number 
of LWE samples, but it has the advantage of carrying over to the ideal case. Also, 
when choosing practical parameters for lattice-based encryption (see, e.g., B31), 
it is impractical to rely on the worst-case hardness of SVP. Instead, the practical 
average-case hardness of LWE is evaluated based on the best known attack which 
consists in solving SIS. Our reduction justifies this heuristic by showing that it 
is indeed necessary to (quantumly) break SIS in order to solve LWE. 


ROAD-MAP. We provide some background in Section B] Section B]shows how to 
hide a trapdoor in the adaptation of SIS to ideal lattices. Section M contains the 
new reduction between SIS and LWE. Finally, in Section] we present our CPA- 
secure encryption scheme and briefly describe other cryptographic constructions. 


NOTATION. Vectors will be denoted in bold. We denote by (-,-) and ||- || the 
inner product and the Euclidean norm. We denote by p(x) (resp. vs) the stan- 
dard n-dimensional Gaussian function (resp. distribution) with center 0 and 
variance s, i.e., ps(@) = exp(—7||x||?/s?) (resp. vs(£) = ps(a)/s”). We use 
the notations O(-) and Q(-) to hide poly-logarithmic factors. If Dı and Də are 
two probability distributions over a discrete domain Æ, their statistical distance 
is A(D1, D2) = 4 $reg |Dı(£) — D2(x)|. If a function f over a countable do- 
main E takes non-negative real values, its sum over an arbitrary F C E will be 
denoted by f(F). If q is a prime number, we denote by Z, the field of integers 
modulo q. We denote by WY, the reduction modulo q of vs. 


2 Reminders and Background Results on Lattices 


We refer to BI] for a detailed introduction to the computational aspects of lat- 
tices. In the present section, we remind the reader very quickly some fundamental 
properties of lattices that we will need. We then introduce the so-called ideal 
lattices, and finally formally define some computational problems. 


Euclidean lattices. An n-dimensional lattice L is the set of all integer lin- 
ear combinations of some linearly independent vectors b1,...,bn € R”, i.e., 
L = X` Zb;. The 6,’s are called a basis of L. The ith minimum A;(L) is the 
smallest r such that L contains 7 linearly independent vectors of norms < r. 
We let A?°(L) denote the first minimum of L with respect to the infinity norm. 
If B = (b),...,b,) is a basis, we define its norm by ||B|| = max ||b;|| and its 
fundamental parallelepiped by P(B) = {}0, cib; | c € [0,1)"}. Given a basis B 
for lattice L and a vector c € R”, we define c mod L as the unique vector 
in P(B) such that c — (c mod L) € L (the basis being implicit). For any lat- 
tice L and any s > 0, the sum ps(L) is finite. We define the lattice Gaussian 


distribution by Dz,s(b) = 2, for any b € L. If L is a lattice, its dual L is the 


lattice {b ER" |Vbe L, (b, b) € Z}. We will use the following results. 
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Lemma 1 ({29) Lemma 2.11] and Lemma 3.5]). For any x in an n- 
dimensional lattice L and s > 2,/\n(10n)/7/Afe(L), we have Dg s(æ) < 27"*1. 


Lemma 2 ({22) Lemma 2.10]). Given an n-dimensional lattice L, we have 
Pre~p, [llæll > syn] 2-7". 


Ideal lattices. Ideal lattices are a subset of lattices with the computationally 
interesting property of being related to polynomials via structured matrices. The 
n-dimensional vector-matrix product costs O(n) arithmetic operations instead 
of O(n?). Let f € Z[x] a monic degree n polynomial. For any g € Q[zx], there is a 
unique pair (q,r) with deg(r) < n and g = qf +r. We denote r by g mod f and 
identify r with the vector r € Q” of its coefficients. We define rots (r) € Q"*” as 
the matrix whose rows are the z'r(x) mod f(x}’s, for 0 < i < n. We extend that 
notation to the matrices A over Q[z]/f, by applying rot  component-wise. Note 
that rot;(gi)rot ¢(g2) = rotr(gig2) for any gi, 92 E€ Q[z|/f. The strengths of our 
cryptographic constructions depend on the choice of f. Its quality is quantified 
by its expansion factor (we adapt the definition of [I7] to the Euclidean norm): 


d 
BPC, k) = max | ALT |g € Zla] {0} and deglo) < b(dex(s) —1)}, 
where we identified the polynomial g mod f (resp. g) with the coefficients vector. 


Note that if deg(g) < n, then |l|rot (g)|| < EF(f, 2) - ||g||. We will concentrate 


on the polynomials x?” + 1, although most of our results are more general. We 


recall some basic properties of x?” +1 (see [7] for the last one). 


Lemma 3. Let k > 0 and n = 2*. Then f(x) = x" +1 is irreducible in Q{z]. 
Its expansion factor is < v2. Also, for any g = Dien git’ E€ Q[a]/f, we 
have rot(g)* = rots(9) where 9 = go — Y i<icn Gn—ix'. Furthermore, if q is 
a prime such that 2n|(q — 1), then f has n linear factors in Zg|x]. Finally, 
if k > 2 and q is a prime with q = 3 mod 8, then f = fi fz mod q where each fi 
is irreducible in Z,[x] and can be written fi = x”/2 + tix”/t — 1 with ty € Zq. 


Let I be an ideal of Z|z]/f, i.e., a subset of Z|x]/f closed under addition and 
multiplication by any element of Z[x]/f. It corresponds to a sublattice of Z”. 
An f-ideal lattice is a sublattice of Z” that corresponds to an ideal I C Z[a]/f. 


Hard lattice problems. The most famous lattice problem is SVP. Given a basis 
of a lattice L, it aims at finding a shortest vector in L \ {0}. It can be relaxed by 
asking for a non-zero vector that is no longer than y(n) times a solution to SVP, 
for a prescribed function y(-). The best polynomial time algorithm solves Ņ- 
SVP only for a slightly subexponential y. When y is polynomial in n, then the 
most efficient algorithm [4] has an exponential worst-case complexity both in 
time and space. If we restrict the set of input lattices to ideal lattices, we obtain 
the problem Ideal-SVP (resp. y-Ideal-SVP), which is implicitly parameterized 
by a sequence of polynomials f of growing degrees. No algorithm is known to 
perform non-negligibly better for Ideal-SVP than for SVP. It is believed that 
no subexponential quantum algorithm solves the computational variants of SVP 
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or Ideal-SVP in the worst case. These worst-case problems can be reduced to 
the following average-case problems, introduced in [ and R]. 


Definition 1. The Small Integer Solution problem with parameters q(-), m(-), 
BCO) GISa,m,p) is as follows: Given n and a matrix G sampled uniformly in 


A dais find e € Z™™ \ {0} such that e'G = 0 mod q(n) (the modulus be- 
ing taken component-wise) and |le|| < B(n). The Ideal Small Integer Solution 
problem with parameters q,m, 2 and f (Ideal-SIS! 3) is as follows: Given n 
and m polynomials g1,...,9m chosen uniformly and independently in Zq{x]/f, 
finder, ...,€m E Z[x] not all zero such that X` ;< m eigi = 0 in Zq[a]/f and |lel| < 


B, where e is the vector obtained by concatenating the coefficients of the e;’s. 


The above problems can be interpreted as lattice problems. If G € Z7’*", then 
the set G+ = {b € Z™ | bT G = 0 mod q} is an m-dimensional lattice and solving 
SIS corresponds to finding a short non-zero vector in it. Similarly, Ideal-SIS 
consists in finding a small non-zero element in the Z[z]/f-module M+(g) = 
{b € (Z[a]/f)™ | (b,g) = 0 mod q}, where g = (g1,..-,9m)- It can be seen as 
a lattice problem by applying the rots operator. Note that the m of SIS is n 
times larger than the m of Ideal-SIS. Lyubashevsky and Micciancio [[7] reduced 
Ideal-SVP to Ideal-SIS. The approximation factors in [IA are given in terms 
of the infinity norm. For our purposes, it is more natural to use the Euclidean 
norm. To avoid losing a yn factor by simply applying the norm equivalence 
formula, we modify the proof of [L]. We also adapt it to handle the case where 
the Ideal-SIS solver has a subexponentially small success probability, at the cost 
of an additional factor of O(./n) in the SVP approximation factor. 


Theorem 1. Suppose that f is irreducible over Q. Let m = Poly(n) and q = 
Q(EF(f,3)8m2n) be integers. A polynomial-time (resp. subexponential-time) al- 
gorithm solving Ideal-SIS! „g with probability 1/Poly(n) (resp. 2-°™) can be 
used to solve y-Ideal-SVP in polynomial-time (resp. subexponential-time) with 


y = O(EF*(f,2)@mn1/?) (resp. y = O(EF?(f,2)mn)). 


The problem LWE is dual to SIS in the sense that if G € Z7?*” is the SIS- 


matrix, then LWE involves the dual of the lattice G+. We have GI = 7L(G) 
where L(G) = {b € Z™ | 3s € Z, Gs = b mod q}. 


Definition 2. The Learning With Errors problem with parameters q,m and a 
distribution x on R/[0, q) (LWEg.m;x) is as follows: Given n, a matriz G € Zi?*" 
sampled uniformly at random and Gs + e € (R/[0,q))", where s € Zi is chosen 
uniformly at random and the coordinates of e € (R/[0,q))™ are independently 
sampled from x, find s. The Ideal Learning With Errors problem with parame- 
ters q,m, a distribution x on R/[0,q) and f (Ideal-LWES, ,...) is the same as 


mx 
above, except that G = rot s(g) with g chosen uniformly in (Zq[a]/f)™. 


We will use the following results on the LWE and Ideal-LWE lattices. 
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Lemma 4. Let n,m and q be integers with q prime, m > 5nlogq and n > 10. 
Then for all but a fraction < q`” of the G’s in ZY”, we have AY (L(G)) = q/4 
and X1(L(G)) > 0.07./mgq. 


Lemma 5. Let n,m and q be integers with q = 3 mod 4 prime and m > 41 log gq 
and n = 2* > 32. Then for all but a fraction < q~" of the g’s in (Zq[x]/f)™, 
we have A°(L(rot ¢(g))) > q/4 and Ai (L(rot ¢(g))) > 0.017,/mng. 


3 Hiding a Trapdoor in Ideal-SIS 


In this section we show how to hide a trapdoor in the problem Ideal-SIS. Aj- 
tai [2] showed how to simultaneously generate a (SIS) matrix A € Zj’*" and 
a (trapdoor) basis S = (81,...,8m) € Z™*™ of the lattice At = {b € Z™ : 
bT A = 0 mod q}, with the following properties: 


1. The distribution of A is close to the uniform distribution over Le. 
2. The basis vectors $1,...,Sm are short. 


Recently, Alwen and Peikert [5] improved Ajtai’s construction in the sense that 
the created basis has shorter vectors: ||S|| = O(nlogq) with m = Q(nlogq) 
and overwhelming probability and ||S|| = O(/nlogq) with m = R(nlog’ q). 
We modify both constructions to obtain a trapdoor generation algorithm for the 
problem Ideal-SIS, with a resulting basis whose norm is as small as the one of [5]. 

Before describing the construction, we notice that the construction of [b] relies 
on the Hermite Normal Form (HNF), but that here there is no Hermite Normal 
Form for the rings under scope. We circumvent this issue by showing that except 
in negligibly rare cases we may use a matrix which is HNF-like. 


Theorem 2. There exists a probabilistic polynomial time algorithm with the fol- 
lowing properties. It takes as inputs n,o,r, an odd prime q, and integers m1, M2. 
It also takes as input a degree n polynomial f € Z|x| and random polynomials 


a, € (Z,[x|/f)™. We let f = J ]i<ı fi be the factorization of f over Zq. We 


deg fi 1/2 
let x = [1 +1084], A= (Tice (1+ (4) )-1) and m = mı +mə. The 
algorithm succeeds with probability > 1 — p over ay, where p = (1 — [ |; (1 — 


q 8 F:))*, When it does, it returns a = ee € (Z,[z]/f)™ and a basis S of 
2 
the lattice rot s(a)+, such that: 


1. The distance to uniformity of a is at most p+ m2A. 
2. The quality of S is as follows: 
— If mı > max{o,K,r} and mz > r, then ||S|| < EF(f,2) - V2«r!/2n3/?, 
Additionally, ||S|| < EF(f,2)V3axKr-n with probability 1—2—¢+ Oleg nmr) 
for a super-logarithmic function a = a(n) = w(log n). 
— If mı > max{o,K,r} and mz > Km}, then ||S|| < EF(f, 2)(4ynr + 3). 
3. In particular, for f = x +1 with k > 2 and a prime q with q = 3 mod 8, 
the following holds: 
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— We can set o = 1 and r = [1+ logsq]. Then, the error probability is 
p=q7?™ and the parameter A is 2720, 

— If mı,m > n, then ||S|| < v6arr -n = O(yanlogq) with probability 
1—279+0lognmir) for a super-logarithmic function a = a(n) = w(log n). 


— If m, > and mz > kmi, then ||S|| < V2(4vnr + 3) = O(vn log q). 


In the rest of this section, we only describe the analog of the second construction 
of Alwen and Peikert, i.e., the case mz > Km ,, due to lack of space. 


3.1 A Trapdoor for Ideal-SIS 


We now construct the trapdoor for Ideal-SIS. More precisely, we want to simul- 
taneously construct a uniform a € R™ with R = Z,|x]/f, and a small basis S$ 
of the lattice A+ where A = rot;(a). For this, it suffices to find a basis of the 
module M+(a) = {y € Ri” | (y,a) = 0 mod q}, with Ro = Z[z]/f. 


The principle of the design. In the following, for two matrices X and Y, 
[X|Y] denotes the concatenation of the columns of X followed by Y and [X; Y] 
denotes the concatenation of the rows of X and the rows of Y. 

We mainly follow the Alwen-Peikert construction. Let mı > 0,r. Let us as- 
sume that we generate random polynomials Ay = [a1,..., am] E€ R™™*!}. 
We will construct a random matrix Ap € R™2X1 with a structured matrix 
S € Rg %”™ such that SA = 0 and S is a basis of the module M+(a), where 
A = [A1; A2]. We first construct an HNF-like basis F of the module M+(a) with 
A. Next, we construct a unimodular matrix Q such that S = QF is a short basis 
of the module. More precisely, S has the following form: 


ga CP] =|?) 
=IDÐDB| T| 0 B| |U Iml" 
SS —_—_ C 

Q F 


Note that, by setting B lower triangular with diagonal coefficients equal to 1, 
the matrix Q is unimodular. 
In this design principle, we want FA = 0. Hence, we should set 


HA, = 0 and Ag = —U Aı. 


Notice that, in order to prove that F is a basis of A+, it suffices to show that 
H is a basis of At. The first equation is satisfied by setting H be an HNF- 
like matrix (see below). By setting U = G+ R, with G to be defined later on 
and R a random matrix, we have that Ag is almost uniformly random in R by 
Micciancio’s regularity lemma (Lemma Għ. More precisely, the i-th row of R is 
chosen from ({—1,0,1}")" x ({O}”)7™~". 


Lemma 6 (Adapted from Th. 4.2]). Let F be a finite field and f € F{z] 
be monic and of degree n > 0. Let R be the ring F|x]/f. Let D C F andr > 0. 
For ay,...,a, E€ R, we denote by H(a1,...,a,) the random variable Des biai € 
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R where the b;’s are degree < n polynomials with coefficients chosen inde- 
pendently and uniformly in D. If Uy,...,U, are independent uniform random 
variables in R, then the statistical distance to uniformity of (Ui,...,U>, 


H(UW,,...,U;)) is below: 
deg fi 
yo) 


where f =[],<, fi is the factorization of f over F. 


We show below how to choose P and G such that PG = H — Im,. With this 
relation, the design principle form of S therefore implies that V = —H + P(G+ 
R) = PR — Im, and D = B(G + R). Our constructions for P, Œ, B also ensure 
that P, B and BG have ‘small’ entries so that S has ‘small’ entries. 


A construction of H without HNF. We start with how to construct H for 
Ay = [a1,...,@m,|7 € R™X1, Since mı > max{o,Kk,r}, we have aj € R* 
for some index i* with probability at least 1 — p, where R* denotes the set of 
invertible elements of R. For now, we set 7* = 1 for simplicity. Using this ai», 
we can construct an HNF-like matrix H: the first row is ge, and the i-th row is 
hye, + e; for i = 2,...,m1, where e; is a row vector in Rp” such that the i-the 
element is 1 and others are 0, and h; = —a,-aj,' mod q such that h; € [0, q)”. 
Let h; denote the i-th row of H. By the definition of H, H.-A; = 0 mod q. Thus, 
each row vector h; is in M+(a,), where a; = Aj. It is obvious that hi,...,hm, 
are linearly independent over Ro. Hence, we need to only show that H is indeed 
the basis of M+(a,), but this is a routine work. 

Next, we consider the case where i* 4 1. In this case, we swap rows 1 and i* 
of A; so that a; € R*, and call it Aj. Applying the method above, we get a 
basis H’ of A+(A‘,). By swapping columns 1 and 7* and rows 1 and i* of H’, 
we get a basis H of A+(Aj,). In the following, we denote by i* the index i such 
that a; E€ R* and hy, = q. Note that our strategy fails if there is no index 7 such 
that a; E€ R*: this is not an issue, as this occurs only with small probability. 


Preliminaries of the construction. Hereafter, we set W = BG. We often use 
the matrix Te = (tij) € REŽ", where tii = 1, ti41,, = —2, and all other t;,;’s 
are 0. Notice that the i-th row of Tg + is (2*-1,2*-?,...,1,0,...,0) E RË. 


3.2 An Analogue to the Second Alwen-Peikert Construction 


The idea of the second construction in [b] is to have G contain the rows of H — Im,- 
This helps decrease the norms of the rows of P and V. To do so, we define 
B = diag(T oreesa | erdmamin): Note that B~ diag T cel amir): 


K 
Let hj, denote the j-th row of H — Im, . Let W = [W1; W2; . . .; Wm, ; 0], where 
W; = [wjr .. ; Wy] € Ro*™. We compute the w; x’s such that hi, = 57, 2571. 
wj k and the components of all w;,’s are polynomials with coefficients in {0, 1}. 
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By this construction, Tọ +- W; contains hi, in the last row. Then, G = B~*. W 
contains rows h’, for j = 1,..., mı. The matrix P = [p1;...; Pm,] picks all rows 
hi, +, hla, in G by setting pj = e,; € Ry”. 

The norm of S is max{||Sj]|,||S2||}, where S$; = [V|P] and S2 = [D|B]. For 
simplicity, we only consider the case where f = x” + 1. In the general case, the 
bound on ||S|| involves an extra EF(f,2) factor. 

We have that || BG||? = ||W||? < n, since the entries of hi, are all 0 except one 


which is either hi» j or q — 1. Hence, we obtain that 


|| $2l|? < IDI? + BI? < BVnr + vn)? +5 < (4Vnr + 3)’. 
It is obvious that ||P|| < 1. Additionally, we have that ||PR||? < nr. Therefore: 


Sl? < IVI? + PIP? < (Var +1) +1 < (vnr + 2)”, 


which completes the proof of Theorem B] 


4 From LWE to SIS 


We show that any efficient algorithm solving LWE with some non-negligible 
probability may be used by a quantum machine to efficiently solve SIS with 
non-negligible probability. A crucial property of the reduction is that the matrix 
underlying the SIS and LWE instances is preserved. This allows the reduction 
to remain valid while working on Ideal-SIS and Ideal-LWE. 


Theorem 3. Let g,m,n be integers, and a € (0,1) with n > 32, Poly(n) > 
> (0.006), eun 
m > 5nlogq and a < nin( a 0.006 }. Suppose that there exists an algo 


rithm that solves LWEm,q;v, iN time T and with probability € > 4m exp (—{5). 
Then there exists a quantum algorithm that solves SIS a in time Poly(T, n) 
195 Fa 


and with probability E — O(e5) — 2720). The result still holds when replac- 
ing LWE by Ideal-LWES and SIS by Ideal-SIS", for f = x” +1 with n = 2* > 32, 
m > 41logq and q = 3 mod 8. 


When a = O(1/./n), the reduction applies even to a subexponential algorithm 
for LWE (with success probability € = 2~°”), transforming it into a subexpo- 
nential quantum algorithm for SIS (with success probability e = 27°). The 
reduction works also for larger a = O(1/./logn), but in this case only applies to 
polynomial algorithms for LWE (with success probability € = 2(1/Poly(n))). 

The reduction is made of two components. First, we argue that an algorithm 
solving LWE provides an algorithm that solves a certain bounded distance de- 
coding problem, where the error vector is normally distributed. In a second step, 
we show that Regev’s quantum algorithm [82] Lemma 3.14] can use such an al- 
gorithm to construct small solutions to SIS. 
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4.1 From LWE to BDD 


An algorithm solving LWE allows us to solve, for certain lattices, a variation of 
the Bounded Distance Decoding problem. In that variation of BDD, the error 
vector is sampled according to a specified distribution. 


Definition 3. The problem BDD, with parameter distribution x(-) is as follows: 
Given an n-dimensional lattice L and a vector t = b+ e where b € L and e is 
distributed according to x(n), the goal is to find b. We say that a randomized 
algorithm A solves BDD, for a lattice L with success probability > e if, for 
every b E€ L, on input t = b+ e, algorithm A returns b with probability > € over 
the choice of e and the randomness of A. 


For technical reasons, our reduction will require a randomized BDD, algorithm 
whose behaviour is independent of the solution vector b, even when the error 
vector is fixed. This is made precise below. 


Definition 4. A randomized algorithm A solving BDD, for lattice L is said 
to be strongly solution-independent (SSI) if, for every fixed error vector e, the 
probability (over the randomness of A) that, given input t = b + e with be L, 
algorithm A returns b is independent of b. 


We show that if we have an algorithm that solves LWE,,¢,y,,, then we can 
construct an algorithm solving BDD,,,. for some lattices. Moreover, the con- 
structed BDD algorithm is SSI. 


Vaq 


Lemma 7. Let q,m,n be integers and a E€ (0,1), with m,logq = Poly(n). 
Suppose that there exists an algorithm A that solves LWE gv, in time T and 
with probability € > 4m exp (—z). Then there exists S C Lee of proportion > 
e/2 and an SSI algorithm A’ such that if G € S, algorithm A’ solves BDD 
for L(G) in time T + Poly(n) and with probability > ¢/4. 


Vaq 


Proof. If G € Z %” and s € Zj are sampled uniformly and if the coordinates 
of e are sampled according to Waq, then A finds s with probability > £ over the 
choices of G, s and e and a string w of internal random bits. This implies that 
there exists a subset S of the G’s of proportion > ¢/2 such that for any G € S, 
algorithm A succeeds with probability > £/2 over the choices of s, e and w. For 
any G € S, we have Prs ew A(Gs + e, w) = s] > £€/2. 

On input t = b+e, algorithm A’ works as follows: it samples s uniformly in Zy; 
it computes t = t+ As, which is of the form t = Gs'+ qk +e, where k € Z”; it 
calls A on t mod q and finds s’ (with probability > ¢/2); it then computes e’ = 
t — Gs’ mod q and returns t — e’. Suppose that A succeeds, i.e., we have s = s’. 
Then e’ = e mod q. Using the standard tail bound on the continuous Gaussian 
and the lower bound on € we obtain that e has a component of magnitude > q/2 
with probability < mexp(—7/(2a)?) < ¢/4. The algorithm thus succeeds with 
probability > £/2 — €/4 = € /4. 


We now show that an algorithm solving BDD,,„, can be used to solve a quantized 
version of it. This quantization is required for the quantum part of our reduction. 
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The intuition behind the proof is that the discretization grid is so fine (the 
parameter R can be chosen extremely large) that at the level of the grid the 
distribution v, looks constant. 


Lemma 8. Let s > 0 and L be an n-dimensional. Suppose that there exists an 
SSI algorithm A that solves BDD,, for L in time T and with probability €. Then 
there exists an R, whose bit-length is polynomial in T,n,|log s| and the bit-size 
of the given basis of L, and an SSI algorithm A’ that solves BDDp,,,,,, within 


a time polynomial in log R and with probability > € — 272%), 


At this point, we have an R of bit-length polynomial in T,n,|loga| and an SSI 
algorithm 8B with run-time polynomial in log R that solves BDDp, (¢);r,44 fOr 
any G in a subset S C Z7’*" of proportion > ¢/2, with probability > e/4—2- 2) 
over the random choices of e and the internal randomness w. In the following we 
assume that on input t = b+ e, algorithm B outputs e when it succeeds, rather 
than b. We implement B quantumly as follows: the quantum algorithm Bg maps 
the state |e) |b + e) |w) to the state |e — B(b+ e, w)) |b + e) |w). 


4.2 A New Interpretation of Regev’s Quantum Reduction 


We first recall Regev’s quantum reduction [82] Lemma 3.14]. It uses a random- 
ized BDD oracle 6° that finds the closest vector in a given lattice L to a given 
target vector, as long as the target is within a prescribed distance d < Auth) of L 
(as above, we assume that 6" returns the error vector). It returns a sample from 
the distribution Dz yg We implement oracle 5%° as a quantum oracle B° as 


‘V2d 
above. We assume BOS accepts random inputs of length £. 


1. Set R to be a large constant and build a quantum state which 
is within l distance 27?) of the normalized state corresponding 
to X we{o,1}! Leek, Jæl<a Pd (2) |æ) |e mod L) |w). 

2. Apply the BDD oracle B&° to the above state to remove the entanglement 
and obtain a state which is within 4> distance 272) of the normalized state 
corresponding to ae 4, lal) <d ps. (a) |O) |x mod L) |w). 

3. Apply the quantum Fourier transform over Zi, to the second register to 
obtain a state that is within ə distance 272(n) of the normalized state 
corresponding to eel, |lal| <3 p va (x) xz mod (R- 1). 

4. Measure the latter to obtain a vector 6 mod R-L. Using Babai’s algorithm [6], 


recover b and output it. Its distribution is within statistical distance 272) 


of Dz va 
Lie 


We now replace the perfect oracle BG° by an imperfect one. 


Lemma 9. Suppose we are given an n-dimensional lattice L, parameters R > 


22n An (L) and s < -o and an SSI algorithm B that solves BDDp,, „s for L with 


run-time T and success probability €. Then there exists a quantum. algorithm R 
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which outputs e vector 4 € L whose distribution is within distance 1 — e7/2+ 
Oe) 22-2 ) of Dz, . It finishes in time polynomial in T + log R. 


Proof. The quantum —s R is Regev’s algorithm above with parame- 
ter d = V2ns < Aut) where BG° is replaced by the quantum implementa- 
tion Bg of B. We just saw that if the BDD Drjrss oracle was succeeding with 


probability 1—27?) then the output vector b would follow a distribution whose 
statistical distance ie Dz, 2 would be 272(”), To work around the requirement 
that the oracle succeeds with overwhelming probability, we use the notion of 
trace distance between two quantum states, which is an adaptation of the statis- 
tical distance (see Ch. 9]). The trace distance between two (pure) quantum 
states |t;) and |t2) is 6(|t1) , |t2)) = V1 — | (tite) |?. Its most important property 
is that for any generalized measurement (POVM), if Dı (resp. D2) is the result- 
ing probability distribution when starting from |t1) (resp. |t2)) then A(D,, Dz) < 
d(|t1) , |t2)). Let |t1) denote the state at the end of Step 2 of Regev’s algorithm 
when we use 6°, and let |t2) denote the state that we obtain at the end of 
Step 2 when we use B. We upper bound 4(|t1) , |t2)) as follows. 

Since B”°(a@ mod L, w) = g for ||a|| < d, we have that |t1) is within 42 distance 
(and hence trace distance) 2~?(™ of the normalized state 


lt) =2 X J y Dlr o(@) |0) |æ mod L) |w), 


wE{0,1}" xe $ 


where De IRs denotes the normalized distribution obtained by truncating Dz /R,s 
to vectors of norm < d. On the other hand, for the imperfect oracle B, we have 


that |tg) is within trace distance 2~?(") of the normalized state 


ta = 2°? X J yD? (a) |e -B(x mod L, w)) |æ mod L) |w). 
we{0,1}* wes 


Let Sg = {(x,w) 


€ 
Notice that, if (x, w) 
and |0) |a’ mod L) jaw! ) 


x {0,1} | |æ < dand B(x mod L,w) = æ}. 
Sg, the states |x — B(a mod L, w)) |x mod L) |w) 
are orthogonal for all (æ’, w’). Furthermore, if (x, w) € 
Sg, the states |0)|a mod L) |w) and |0}|x’ mod L)|w’) are orthogonal for 
all (a’,w’) # (æ, ,w) with ||æ’|| < d, because the mapping x — æ mod 
L is 1-1 over æ of norm < d < A(L)/2. It follows that |(tilt) | = 
Deuce “Dt jrs (2) Hence, |(tiltż)}| is equal to the probability p 
that B(a mod L, w) = æ, over the choices of æ from the distribution D4 IR,s 


and w uniformly random in {0,1}. By Lemma B] using the fact that d > /ns, 
we have p > p—272?(”), where Pis the corresponding probability when æ is sam- 
pled from Dz p,s. Finally, we have P = `p DL/R,s(®) Pru[b (æ mod L, w) = a]. 
By the strong solution-independence of B, we have Pr,,(B(# mod L, w) = a] = 
Pr,,[B(b+ x, w) = x] for any fixed b € L. Therefore, p is the success probabil- 
ity of B in solving BDDp,,,,, 80 p > £ by assumption. Overall, we conclude 


that 6(|t1) , |t2)) < V1 —e2+2-2™), and hence the output of R is within sta- 
tistical distance 1 — e?/2 + O(e*) + 2-2™ of Dj 4, as claimed. 


L 
R 
Z 
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To prove Theorem B] we apply Lemma [J to the lattices L(G) for G € S, with 
algorithm 6. For that, we need to ensure that the hypothesis aq < NG) is 


satisfied. From Lemma[4](resp. LemmafJin the case of Ideal-LWE), we know that 
with probability 1—2~?() over the choice of G in Z>”, we have AP (L(G)) > å 


and Àı(L(G)) > 0.07/mq. For such ‘good’ G’s, the hypothesis aq < DHE i is 
satisfied, since a < 0.006. The set S’ of the G’s in S for which that condition is 


satisfied represents a proportion > ¢/2—27-(") of Zi *". Suppose now that G € 


_——__ 


S’. Lemma P] shows that we can find a vector s € G = = qL(G) that follows a 
distribution whose distance to Dg. a is A= 1 — £ + O(e*) +272. Thanks 
to Lemmas [I] and Ø] (since G € S and a < 1/(10,/In(10m)), the oes 
of Lemma [Iis satisfied), we have that with probability > 1 — 27 2n) — A = 
= — O(e4) — 2-2(™), the returned s is a non-zero vector of G+ whose norm 
is < his Multiplying by the probability > ¢/2 — 2-2 that G € S’ gives the 
claimed success probability and completes the proof of Theorem ] 


5 Cryptographic Applications 


We now use the results of Sections Bland H to construct efficient cryptographic 
primitives based on ideal lattices. This includes the first provably secure lattice- 
based public-key encryption scheme with asymptotically optimal encryption and 
decryption computation costs of O(1) bit operations per message bit. 


5.1 Efficient Public-Key Encryption Scheme 


Our scheme is constructed in two steps. Firstly, we use the LWE mapping 
(s,e) + G- s + emodq as an injective trapdoor one-way function, with the 
trapdoor being the full-dimensional set of vectors in G+ from Section B] and the 
one-wayness being as hard as Ideal-SIS (and hence Ideal-SVP) by Theorem B] 
This is an efficient ideal lattice analogue of some trapdoor functions presented 
in for arbitrary lattices. Secondly, we apply the Goldreich-Levin hardcore 
function based on Toeplitz matrices Sec. 2.5] to our trapdoor function, and 
XOR the message with the hardcore bits to obtain a semantically secure encryp- 
tion. To obtain the O(1) amortized bit complexity per message bit, we use (2(n) 
hardcore bits, which induces a subexponential loss in the security reduction. 

Our trapdoor function family Id-Trap is defined in Figure [I] For security 
parameter n = 2", we fix f(z) = z” +1 and q = Poly(n) a prime satisfy- 
ing q = 3 mod 8. From Lemma [3j it follows that f splits modulo q into two 
irreducible factors of degree n/2. We set o = 1, r = 1 + log3q = O(1) and 

= (flogg]+1)o+r = O(1). We define R = Z alz]/ f. The tlpwink lemma en- 
sures the correctness of the scheme (this is eacentially identical to Sec. 4.1]) 
and asserts that the evaluation and inversion functions can be implemented 
efficiently. 
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— Generating a function with trapdoor. Run the algorithm from Theorem Ø] us- 
ing f = x” +1,n,q,r,0,m as inputs. Suppose it succeeds. It returns g € (Zq[x]/f)™ 
(function index) and a trapdoor full-rank set S of linearly independent vectors 
in rots(g)t CZ™"*"" with |||] < /2(4/nr + 3) =: L (we have L = O(\/n)). 

— Function evaluation. Given function index g, we define the trapdoor function 
hg : Z4 X Zg” — Z4” as follows. On input s uniformly random in Zù and e € Zg” 
sampled from Yaq (defined as the rounding of Wa, to the closest integer vector), we 
compute and return: c = hg(s,e) := rots(g)-s +e mod q. 

- Function inversion. Given c = hg(s,e) and trapdoor S, compute d = ST -ce mod q 
and e’ = S77 -d (in Q). Compute u = c—e’ mod q and s‘ = (rot (gi))~*+u1 mod q, 
where ui consists of the first n coordinates of u. Return (s’, e’). 


Fig. 1. The trapdoor function family Id-Trap 


Lemma 10. Let q > 2,/mnL and a = o(1/(LVlogn)). Then for anys E€ R 
and for e sampled from Way the inversion algorithm recovers (s,e) with proba- 
bility 1—n-““) over the choice of e. Furthermore, the evaluation and inversion 
algorithms for hg can be implemented with run-time O(n). 


The one-wayness of Id-Trap is equivalent to the hardness of LWE, dTa Fur- 
thermore, an instance of LWEm,q;va, can be efficiently converted by rounding to 
an instance of LWE,,, q;7„,' This proves Lemma 

Lemma 11. Any attacker against the one-wayness of Id-Trap (with parame- 
ters m,a,q) with run-time T and success probability € provides an algorithm 
for LWEm,q;¥aq with run-time T and success probability €. 


By combining our trapdoor function with the GL hardcore function Sec. 2.5] 
we get the encryption scheme of Figure B} 


— Key generation. For security parameter n, run the generation algorithm of Id-Trap 
to get an hg and a trapdoor S. We can view the first component of the domain of hg 


as a subset of Zs! for £r = O(nlogq) = O(n). Generate r € Z5'*™ uniformly and 
define the Toeplitz matrix Mez € Z5“*“* (allowing fast multiplication [Z6]) whose 


ith row is [ri,...,7e,+i-1]. The public key is (g,r) and the secret key is S. 

— Encryption. Given m-bit message M with 4m = n/logn = Q(n) and public 
key (g, r), sample (s,e) with s € Zj uniform and e sampled from Waq, and evaluate 
Cı = hg(s,e). Compute C2 = M@(Mez-s), where the product Mez -s is computed 
over Z2, and s is viewed as a string over Zs . Return the ciphertext (C1, C2). 

— Decryption. Given ciphertext (C1, C2) and secret key (S,r), invert C1 to compute 
(s,e) such that hg(s,e) = Ci, and return M = C2 © (Maz: s). 


Fig. 2. The semantically secure encryption scheme Id-Enc 


Theorem 4. Any IND-CPA attacker against Id-Enc with run-time T and suc- 


cess probability 1/2 + € provides an algorithm for Ideal-LWEY, gv, 


time O(23"n3e—3 . T) and success probability Q(2-n-! - £). 


with run- 
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Proof. The attacker can be converted to a GL hardcore function distinguisher 
that, given C1 = hg(s,e), Maz, and £m bit string z, for s sampled uniformly 


in Zj, e sampled from Waq, and Mgr constructed as in the key generation 
procedure, distinguishes whether z is uniformly random (independent of s and e) 
or z = Me_z-s. It has run-time T and advantage £. The result follows by applying 
Lemma 2.5.8, Proposition 2.5.7 and Proposition 2.5.3 in [I0]. Note that we do 
not need to give the vector e additionally to s as input to the GL function, as e 


is uniquely determined once s is given (with overwhelming probability). 


By using Lemma [U] and Theorems M Bland Æ we get our main result. 


Corollary 1. Any IND-CPA attacker against encryption scheme |d-Enc with 
run-time 2°™ and success probability 1/242 %) provides a quantum algorithm 
for O(n?)-Ideal-SVP with f(x) = 2" +1 and n = 2", with run-time 2° and 
overwhelming success probability. Furthermore, the scheme \d-Enc encrypts and 
decrypts Q(n) bits within O(n) bit operations, and its keys have O(n) bits. 


5.2 Further Applications 


Our results have several other applications, adapting various known construc- 
tions for unstructured lattices to ideal lattices, as summarised below. 


CCA2-secure encryption. Peikert derived a CCA2-secure encryption 
scheme from the non-structured variant of the trapdoor function family Id-Trap 
from Figure[]] using the framework of BIBA for building a CCA2-secure scheme 
from a collection of injective trapdoor functions that is secure under correlated 
product (i.e., one-wayness is preserved if several functions are evaluated on the 
same input). The approach of can be applied to Id-Trap, using the equality 
between Ideal-LWE;,,, and the product of k instances of Ideal-LWE,,,, multiple 
hardcore bits as in Id-Enc, and instantiating the required strongly unforgeable 
signature with the Ideal-SVP-based scheme of [I8]. By choosing k = O(n) (the 
bit-length of the verification key in [I8}) and a = O(n~3/2), we obtain a CCA2- 
secure scheme that encrypts f(n) bits within O(n?) bit operations and whose 
security relies on the exponential quantum hardness of O(n*)-Ideal-SVP : 


Trapdoor signatures. Gentry et al. give a construction of a trapdoor 
signature (in the random oracle model) from any family of collision-resistant 
preimage sampleable functions (PSFs). They show how to sample preimages 
of falx) = 27G, where G € Zy”, using a full-dimensional set of short vec- 
tors in G+. By applying this to G = roty(g) and using the trapdoor genera- 
tion algorithm from Section B] we obtain a PSF whose collision resistance relies 
on Ideal-SIS, and hence Ideal-SVP, and thus a structured variant of the trapdoor 
signature scheme of [9], with O(n) verification time and signature length. 


ID-based identification. From lattice-based signatures, we derive [D-based 
identification (IBI) and ID-based signature (IBS). Applying the standard strat- 
egy, we construct lattice-based IBI schemes as follows: The master generates a 
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key pair of a lattice-based signature scheme, say (G, S); Each user obtains from 
the master a short vector e such that e7 G = H(id), where H is a random oracle; 
The prover proves to the verifier that he/she has a short vector e through the 
Micciancio-Vadhan protocol [24]. This combination yields concurrently secure 
IBI schemes based on O(n?)-SVP and O(n?)-Ideal-SVP in the random oracle 
model. As the MV protocol is witness indistinguishable, we can use the Fiat- 
Shamir heuristic $| and derive lattice-based IBS schemes. 


ID-based encryption (IBE). It is shown in P] that the unstructured variant 
of the above trapdoor signature can be used as the identity key extraction for 
an IBE scheme. This requires a ‘dual’ version of Id-Enc, in which the public key 
is of the form (g,u), where u = H(id) is the hashed identity, and the secret 
key is the signature of id, i.e., a short preimage of u under fg(x) = x7 rot ;(g). 
We construct the ‘dual’ encryption as (C),C2) where C1 = hg(s,e) and C2 = 
Te(rot ¢(u)-s)+M, where M € Zt contains the message and T)(rot (u)-s) denotes 
the first @ coordinates of rot (u) - s mod q. By adapting the results of [I3], we 
show that Ty(rot (wu): s) is an exponentially-secure generic hardcore function for 
uniform u € Zi, when ¢ = o(n). This allows us to prove the IND-CPA security 
of the resulting IBE scheme based on the hardness of Ideal-SVP. 
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Abstract. We describe a public-key encryption scheme based on lat- 
tices — specifically, based on the hardness of the learning with error 
(LWE) problem — that is secure against chosen-ciphertext attacks while 
admitting (a variant of) smooth projective hashing. This encryption 
scheme suffices to construct a protocol for password-based authenticated 
key exchange (PAKE) that can be proven secure based on the LWE as- 
sumption in the standard model. We thus obtain the first PAKE protocol 
whose security relies on a lattice-based assumption. 


1 Password-Based Authenticated Key Exchange 


Protocols for password-based authenticated key exchange (PAKE) enable two 
users to generate a common, cryptographically-strong key based on an initial, 
low-entropy, shared secret (i.e., a password). The difficulty in this setting is to 
prevent off-line dictionary attacks where an adversary exhaustively enumerates 
potential passwords on its own, attempting to match the correct password to 
observed protocol executions. Roughly, a PAKE protocol is “secure” if off-line 
attacks are of no use and the best attack is an on-line dictionary attack where an 
adversary must actively try to impersonate an honest party using each possible 
password. On-line attacks of this sort are inherent in the model of password- 
based authentication; more importantly, they can be detected by the server as 
failed login attempts and (at least partially) defended against. 

Due to the widespread use of passwords, a significant amount of research has 
focused on designing PAKE protocols. Early work (see also [[4]) considered 
a “hybrid” model where users share public keys in addition to a password. In 
the more challenging “password-only” setting clients and servers are required to 
share only a password. Bellovin and Merritt M] initiated research in this direc- 
tion, and presented a PAKE protocol with heuristic arguments for its security. 
It was not until several years later that formal models for PAKE were devel- 
oped BEII], and provably secure PAKE protocols were shown in the random 
oracle/ideal cipher models BBIJ. 
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Goldreich and Lindell [LI] constructed the first PAKE protocol without ran- 
dom oracles, and their approach remains the only one for the plain model where 
there is no additional setup. Unfortunately, their protocol is inefficient in terms of 
communication, computation, and round complexity. (Nguyen and Vadhan 
show efficiency improvements, but achieve a weaker notion of security. In any 
case, their protocol is similarly impractical.) The Goldreich-Lindell protocol also 
does not tolerate concurrent executions by the same party. 

Katz, Ostrovsky, and Yung [J demonstrated the first efficient PAKE proto- 
col with a proof of security in the standard model; extensions and improvements 
of this protocol were given in DEIGI]. In contrast to the work of Goldreich 
and Lindell, these protocols are secure even under concurrent executions by the 
same party. On the other hand, these protocols all require a common reference 
string (CRS). While this may be less appealing than the “plain model,” reliance 
on a CRS does not appear to be a serious drawback in the context of PAKE 
since the CRS can be hard-coded into the protocol implementation. A different 
PAKE protocol in the CRS model is given by Jiang and Gong [5]. 


PAKE based on lattices? Cryptographic primitives based on lattices are ap- 
pealing because of known worst-case/average-case connections between lattice 
problems, as well as because several lattice problems are currently immune to 
quantum attacks. Also, the best-known algorithms for several lattice problems 
require exponential time (in contrast to sub-exponential algorithms for, e.g., fac- 
toring). None of the existing PAKE constructions (in either the random oracle 
or standard models), however, can be instantiated with lattice-based assump- 
tions[] The barrier to constructing a lattice-based PAKE protocol using the 
KOY/GL approach is that this approach requires a CCA-secure encryp- 
tion scheme (more generally, a non-malleable commitment scheme) with an as- 
sociated smooth projective hash system [MI]. (See Section BJ) Until recently, the 
existence of CCA-secure encryption schemes based on lattices (even ignoring the 
additional requirement of smooth projective hashing) was open. Peikert and Wa- 
ters [22] gave the first constructions of CCA-secure encryption based on lattices, 
but the schemes they propose are not readily amenable to the smooth projective 
hashing requirement. Subsequent constructions do not immediately 
support smooth projective hashing either. 


1.1 Our Results 


Building on ideas of 2420912], we show a new construction of a CCA-secure 
public-key encryption scheme based on the hardness of the learning with er- 
ror (LWE) problem [23]. We then demonstrate (a variant of) a smooth projective 
hash system for our scheme. This is the most technically difficult aspect of our 
work, and is of independent interest as the first construction of a smooth projec- 
tive hash system (for a conjectured hard-on-average language) based on lattice 


1 To the best of our knowledge this includes the protocol of Goldreich and Lindell [I], 
which requires a one-to-one one-way function on an infinite domain (in addition to 
oblivious transfer, which can be based on lattice assumptions [2]]). 
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assumptions. (Instantiating the smooth projective hash framework using lattice 
assumptions is stated as an open question in [2]].) Finally, we show that our 
encryption scheme can be plugged into a modification of the Katz-Ostrovsky- 
Yung/Gennaro-Lindell framework to give a PAKE protocol based on the 
LWE assumption. 


Organization of the paper. In Section B] we define a variant of smooth pro- 
jective hashing (SPH) that we call approximate SPH. We then show in Section B] 
that a CCA-secure encryption scheme having an approximate SPH system suf- 
fices for our desired application to PAKE. 

The main technical novelty of our paper is in the sections that follow. In 
Section H] we review the LWE problem and some preliminaries. As a prelude to 
our main construction, we show in Section Dla CPA-secure encryption scheme 
based on the LWE problem, with an associated approximate SPH system. In 
Section [6] we describe how to extend this initial scheme to obtain CCA-security. 

Throughout the paper, we denote the security parameter by n. 


2 Approximate Smooth Projective Hash Functions 


Smooth projective hash functions were introduced by Cramer and Shoup [ø; 
we follow (and adapt) the treatment of Gennaro and Lindell [9], who extend 
the original definition. Rather than aiming for utmost generality, we tailor the 
definitions to our eventual application. 

Roughly speaking, the differences between our definition and that of Gennaro- 
Lindell are as follows. (This discussion assumes familiarity with [9]; for the reader 
not already familiar with that work, a self-contained description is given below.) 
In [9] there are sets X and L C X; correctness is guaranteed for x € L, while 
smoothness is guaranteed for « € X \ L. Here, we require only approximate 
correctness, and moreover only for elements in a subset L C L. Details follow. 

Fix a CCA-secure (labeled) public-key encryption scheme (Gen, Enc, Dec) and 
an efficiently recognizable message space D (which will correspond to the dic- 
tionary of passwords in our application to PAKE). We assume the encryption 
scheme defines a notion of ciphertext validity such that (1) validity of a cipher- 
text (with respect to pk) can be determined efficiently using pk alone, and (2) all 
honestly generated ciphertexts are valid. We also assume no decryption error. 

For the rest of the discussion, fix a key pair (pk, sk) as output by Gen(1”) 
and let C denote the set of valid ciphertexts with respect to pk. Define sets 
X,{Im}mep, and L as follows. First, set 


X = {(label,C,m) | label € {0,1}"; CEC; me D}. 
Next, for m € D let Lm = {(label, Encpx (label, m), m) | label € {0,1}"} C X; 
i.e., Lm is the set of honestly generated encryptions of m (using any label). Let 


L = UmeDlm. Finally, define 


Lm = { (label, C, m) | label € {0,1}”; Decsg(label, C) = m}, 
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and set L = Umep Lm. (Recall we assume no decryption error, and so Lm de- 
pends only on pk.) Note that Lm C Lm for all m. Furthermore, for any ciphertext 
C and label € {0,1}” there is at most one m € D for which (label, C, m) € L. 


Approximate smooth projective hash functions. An approximate smooth 
projective hash function is a collection of keyed functions {Hp : X —> {0,1}"hrer, 
along with a projection function a: K x ({0,1}* x C) > S, satisfying notions of 
(approximate) correctness and smoothness: 


Approximate correctness: If x = (label,C,m) € L then the value of H;(2) 
is approximately determined by a(k, label, C) and x (in a sense we will make 
precise below). 

Smoothness: If x € X \ L then the value of H;(x) is statistically close to 
uniform given a(k, label, C) and x (assuming k was chosen uniformly in K). 


We stress that, in contrast to J], we require nothing for x € L \ L; furthermore, 
even for x € L we require only approximate correctness. We highlight also that, 
as in P], the projection function a should be a function of label, C only. 

Formally, an ¢(n)-approximate smooth projective hash function is defined 
by a sampling algorithm that, given pk, outputs (K,G,H = {Hk : X > 
{0,1} "bnew, S, a: K x ({0,1}* x C) — S) such that: 


1. There are efficient algorithms for (1) sampling a uniform k € K, (2) com- 
puting H;,(x) for all k € K and z € X, and (3) computing a(k, label, C) for 
all k € K and (label, C) € {0,1}* x C. 

2. For x = (label, C, m) € L the value of Hp(£) is approximately determined 
by a(k, label, C), relative to the Hamming metric. Specifically, let Ham(a, b) 
denote the Hamming distance of two strings a,b € {0,1}”. Then there is 
an efficient algorithm H’ that takes as input s = a(k, label, C) and z = 
(label, C, m,r) for which C = Encpg (label, m; r) and satisfies: 


Pr[Ham(H;(z), H’(s,Z)) > €- n] = negl(n), 


where the probability is taken over choice of k. 
3. For any x = (label, C,m) € X \ L, the following two distributions have 
statistical distance negligible in n: 


{k — K; s = a(k, label, C') : (s, Hy()) } 
and 


{k — K; s = a(k, label, C); v — {0,1}” : (s, v)}. 


3 A PAKE Protocol from Approximate SPH 


We use the standard definition of security for PAKE BEZI]. 

Here, we show that a modification of the Gennaro-Lindell framework [9] can 
be used to construct a PAKE protocol from any CCA-secure encryption scheme 
that has associated with it an approximate smooth projective hash function as 
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Common reference string: pk 


Client (w) Server(w) 


(VK, SK) — K(1*) 
r — {0, 1}* 
label := VK | Client | Server 
C := Encp (label, w; r) Client | VK | C 
r’ {0, 1}* 
label’ := € 
C” := Encyp,(label’, w; r’) 
label := VK | Client | Server 
Server | C’ | s’ ki — K; s' := a(k’, label, C) 


label’ : 


=€ 

k — K; s := a(k, label’, C”) 
tk := H, (label, C’, w) 

Hp (label, C, w) 

sk — {0, 1}; c := ECC(sk) 

A:=tk@c 

a — Signe, (C|C"|s’|s|A) 


if Vrfyy.(C|C"|s’|s|A, o) = 1: 
tk’ := H,(label’, C", w) 
Hp (label, C, w) 
sk := ECC71(tk’ @ A) 


Fig. 1. A 3-round PAKE protocol. The common session key is sk. 


defined in Section P] A high-level overview of the protocol is given in Figure JJ 
a more detailed discussion follows. 


Setup. We assume a common reference string is established before any exe- 
cutions of the protocol take place. The common reference string consists of a 
public key pk for a CCA-secure encryption scheme (Gen, Enc, Dec) that has an 
associated ¢-approximate smooth projective hash system (K, G, H = {H,: X > 
{0,1}"}rex,S, a: K x ({0,1}* x C) > S). We stress that no parties in the sys- 
tem need to hold the secret key corresponding to pk. 


Protocol execution. We now describe an execution of the protocol between an 
honest client Client and server Server, holding common password w. To begin, 
the client runs a key-generation algorithm K for a one-time signature scheme 
to generate verification key VK and corresponding secret (signing) key SK. The 
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client sets label := VK|Client|Server and then encrypts the password w using this 
label to obtain ciphertext C. It then sends the message Client|VK|C to the server. 

Upon receiving the initial message Client|VK|C from the client, the server 
computes its own encryption of the password using label label’ = e, resulting in 
a ciphertext C”. The server also chooses a random hash key k’ — K and then 
computes the projection s’ := a(k’, label, C). It sends C” and s’ to the client. 

After receiving the second protocol message from the server, the client chooses 
a random hash key k — K and computes the projection s := a(k, label’, C”). 
At this point it computes a temporary session key tk = Hy,(label’,C’,w) ® 
Hy, (label, C, w), where H;(label’, C’, w) is computed using the known hash key k, 
and Hp (label, C, w) is computed using the randomness r that was used to gen- 
erate C. (Recall that C is an honestly generated encryption of w.) Up to this 
point, the protocol follows the Gennaro-Lindell framework exactly. As will be- 
come clear, however, the server will not be able to recover tk but will instead 
only recover some value tk’ that is close to tk; the rest of the client’s computation 
is aimed at repairing this defect. 

The client chooses a random session key sk € {0,1} for some £ to be specified. 
Let ECC : {0,1} — {0,1}” be an error-correcting code correcting a 2e-fraction 
of errors. The client computes c := ECC(sk) and sets A := tk@c. Finally, it signs 
C|C"|s'|s|A and sends s, A, and the resulting signature o to the server. 

The server verifies ø in the obvious way and rejects if the signature is invalid. 
Otherwise, the server computes a temporary session key tk’ analogously to the 
way the client did: that is, the server sets tk’ = H;,(label’, C’, w) Hw (label, C, w), 
where Hy, (label, C, w) is computed using the hash key k’ known to the server, 
and H;(label’,C’,w) is computed using the randomness r’ that was used to 
generate C”. (Recall that C” is an honestly generated encryption of w.) Finally, 
the server computes sk := ECCT! (tk $ A). 


Correctness. We now argue that, in an honest execution of the protocol, the 
client and server compute matching session keys with all but negligible probabil- 
ity. Approximate correctness of the smooth projective hash function implies that 
H;,(label, C, w) as computed by the client is within Hamming distance en from 
H;,(label, C,w) as computed by the server, except with negligible probability. 
The same holds for Hw (label’,C’, w). Thus, with all but negligible probability 
we have Ham(tk, tk’) < 2e-n. Assuming this is the case we have 


Ham(tk’ @ A, c) = Ham(tk’ $ A, tk @ A) < 2e-n, 
and so ECC~1(tk’ @ A) = ECCT} (c) = sk. 


Security. The proof of security of the protocol follows closely; we sketch 
the main ideas. First, as in [79], we note that for a passive adversary (i.e., 
one that simply observes interactions between the server and the client), the 
shared session-key is pseudorandom. This is simply because the transcript of 
each interaction consists of semantically-secure encryptions of the password w 
and the projected keys of the approximate SPH system. 

It remains to deal with active (man-in-the-middle) adversaries that modify 
the messages sent from the client to the server and back. The crux of our proof, 
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as in [L779], is a combination of the following two observations (for concreteness, 
consider an adversary that interacts with a client instance holding password w). 


— By the CCA-security of the encryption scheme, the probability that the ad- 
versary can construct a new ciphertext that decrypts to the client’s password 
w is at most q/|D| + negl(n), where q is the number of on-line attacks and 
D is the password dictionary. 

— If the adversary sends the client a ciphertext that does not decrypt to the 
client’s password, then the session-key computed by the client is statistically 
close to uniform conditioned on the adversary’s view. 


We defer a complete proof to the full version. 

Recalling the definitions from Section B] note that correctness of the protocol 
relies on (approximate) correctness for honestly generated encryptions of the 
correct. password (i.e., for x € L), whereas security requires smoothness for 
ciphertexts that do not decrypt to the correct password (i.e., for x ¢ L). 


4 The Learning with Errors Problem 


The “learning with errors” (LWE) problem was introduced by Regev [23] as a 
generalization of the “learning parity with noise” problem. For positive integers 
n and q > 2, a vector s € Z7, and a probability distribution x on Zq, let As,x 
be the distribution obtained by choosing a vector a € Z% uniformly at random 
and a noise term x — x, and outputting (a, (a,s) + £) € Z x Zq. 

For an integer q = q(n) and an error distribution x = y(n) over Z4, the learn- 
ing with errors problem LWE,,, is defined as follows: Given access to an oracle 
that outputs (polynomially many) samples from A, for a uniformly random 
s € Z}, output s with noticeable probability. The decisional variant of the LWE 
problem, denoted distLWE,,,, is to distinguish samples chosen according to Ag_y 
for a uniformly random s € Z% from samples chosen according to the uniform 
distribution over Zý x Zą. Regev [23] shows that for g = poly(n) prime, the LWE 
and distLWE problems are polynomially equivalent. 


Gaussian error distributions. For any r > 0, the density function of a one- 
dimensional Gaussian distribution over R is given by D, (£) = 1/r-exp(—7(a/r)?). 
In this work we always use a truncated Gaussian distribution, i.e., the Gaussian 
distribution D, whose support is restricted to x such that |z| < m/n. The trun- 
cated and non-truncated distributions are statistically close, and we drop the word 
“truncated” from now on. For 3 > 0, define Wg to be the distribution on Z4 ob- 
tained by drawing y — Dg and outputting |q: y| (mod q). We write LWE,,g as 
an abbreviation for LWE, 7 i 

We also define the discrete Gaussian distribution Dzm r over the integer lattice 
Z™, which assigns probability proportional to Iiet D,(e;) to each e € Z™. It 
is possible to efficiently sample from Dz», for any r > 0 [IQ]. 

Evidence for the hardness of LWE,,g follows from results of Regev [223], who 
gave a quantum reduction from approximating certain problems on n-dimensional 
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lattices in the worst case to within O(n/() factors to solving LWE,,¢ for dimension 
n, subject to the condition that 8 -q > 2./n. Recently, Peikert also gave a 
related classical reduction for similar parameters. For our purposes, we note that 
the LWE,,3 problem is believed to be hard (given the state-of-the-art in lattice 
algorithms) for any polynomial q and inverse-polynomial 8 (subject to the above 
condition). 


Matrix notation for LWE. In this paper, we view all our vectors as column 
vectors. At times, we find it convenient to describe the LWE problem LWE,,s 
using a compact matrix notation: find s given (A, As + x), where A — Zp*” 
is chosen uniformly and x — Ty. We also use similar notation for the decision 
version distLWE. 


Connection to lattices. The LWE problem can be thought of as a “bounded- 
distance decoding problem” on a particular kind of m-dimensional lattice defined 
by the matrix A. Specifically, define the lattice 


A(A)= {y €Z™ : Is€Z"st.y=A™s (mod q)}. 


The LWE problem can then be restated as: given y which is the sum of a lattice 
point As and a short “noise vector” x, find the “closest” lattice vector s. One 
can show that as long as x is short (say, ||x|| < q/16), there is a unique closest 
vector to y (see, e.g., [Q)). 


4.1 Some Supporting Lemmas 


We present two technical lemmas regarding the LWE problem that will be used 
to prove smoothness of our (approximate) SPH systems in Sections BQ) and GJ 

If m > nlog gq, the lattice A(A) is quite sparse. In fact, we expect most vectors 
z € Zi" to be far from A(A). The first lemma (originally shown in [23]) formalizes 
this intuition. 

Let dist(z, A(A)) denote the distance of the vector z from the lattice A(A). 
The lemma shows that for most matrices A € Z7’*", the fraction of vectors 
z € Zi" that are “very close” to A(B) is “very small”. The proof is by proba- 
bilistic method, and appears in the full version. 


Lemma 1. Let n,q,m be integers such that m > nlogq. For all but a negligible 
fraction of matrices A, 


Pr [dist(z, A(A)) < 9/4] < qm)”. 


gm 
z— Zy 


Fix a number r > 0, and let e — Dzm „ be drawn from the discrete Gaussian 
distribution over the integer lattice Z™. If the vector z is (close to) a linear 
combination of the columns of A, then given eTA one can (approximately) 
predict e’z. The second lemma shows a converse of this statement when r is 
large enough. Namely, it says that if z and all its non-zero multiples are far 
from the lattice A(A), then eTA does not give any information about eTz. 
In other words, given eTA (where e — Dgzm „ for a large enough r) e7z is 
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statistically close to random. This lemma was first shown in [IQ], and was used 
in the construction of an oblivious transfer protocol in [J]. 

More formally, for a matrix A € Z7’*" and a vector z € Zg’, let A,(A,z) 
denote the statistical distance between the uniform distribution on ae and 
the distribution of (eT A,e7z), where e — Dzm_,. Then, 


Lemma 2. Lemma 6.3] Let r > y/q- w(VJlogn). Then for most matrices 
AE Lye, the following is true: if z € Zy is such that for all non-zero a € Zq, 
dist(az, A(A)) > \/q/4, then A,(A,z) < negl(n). 


5 Approximate Smooth Projective Hashing from Lattices 


As a warmup to our main result we first construct a CPA-secure encryption 
scheme with an approximate SPH system. The main ideas in our final construc- 
tion are already present here. 


5.1 A CPA-Secure Encryption Scheme 


The encryption scheme we use is a variant of the scheme presented in [0BU], 
and is based on the hardness of the LWE problem. We stress that the novelty of 
this work is in constructing an approximate SPH system for this scheme. 

We begin by describing a basic encryption scheme having decryption time ex- 
ponential in the message length f We then modify the scheme so that decryption 
can be done in polynomial time. 

The message space is zé for some integers q, £. In the basic encryption scheme, 
the public key consists of a matrix B € Z7’*", along with ¢+ 1 vectors uo,..., 
u € Zy. To encrypt a message w = (w1,.-.,We) € z the sender chooses a 


uniformly random vector s — Z% and an error vector x — 5. . The ciphertext is 


é 
y = Bs + (uo + wi- ui) +x eZ, 
i=1 


The scheme is CPA-secure, since the dist WE, g assumption implies that the 
ciphertext is pseudorandom. 

The ciphertext produced by the encryption algorithm is a vector y such that 
y — (uo + J$; wi - us) is “close” to the lattice A(B) (the exact definition of 
“close” depends on the error parameter 8). Decrypting a ciphertext is done by 
finding (via exhaustive search over the message space) a message w for which 
y- (uo + iy w;-u;) is “close” to A(B), using the following trapdoor structure 
first discovered by Ajtai [I], and later improved by Alwen and Peikert Ø]. 


? Interestingly, for our eventual application to PAKE a CCA-secure version of this 
scheme would suffice since the scheme has the property that it 7s possible to efficiently 
tell whether a given ciphertext is an encryption of a given message (and this is all 
that is needed to prove security for the protocol in Section Bp. 
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Lemma 3 (J). Fir integers q > 2 and m > 4nlog? q. There is a PPT algo- 
rithm TrapSamp(1”,q, m) that outputs matrices B € Zy” and T € Z™*™ such 
that the distribution of B is statistically close to the uniform distribution over 
Le, and there is an algorithm BDDSolve(T,-) that takes as input a vector 
z € Z™ and does the following: 


— if there is a vector s E€ Z™ such that dist(z,Bs (mod q)) < \/q/4, then the 
output is s. 
— if for every vectors € Z”, dist(z,Bs) > \/q/4, then the output is L. 


Proof. T is a full-rank matrix such that (a) each row of t; has bounded £2 norm, 
ie., ||t:|| < 4)/m, and (b) TB = 0 (mod q). [I showed how to sample a 
pair (B, T) such that B is statistically close to uniform and T has the above 
properties. 

Given such a matrix T and a vector z € Z™, BDDSolve(T, z) works as follows: 


— first, compute z’ = q-T~!- |(T-z) /q] (mod q). 

— Compute (using Gaussian elimination) a vector s € Zọ such that z’ = Bs 
(if such exists; else, output L). 

— If dist(z, Bs) < \/q/4, then output s else output L. 


First, if z = Bs + x for some s € Z7 and x € Z/" such that ||x|| < \/q/4, then 
the procedure above computes 


z =q-T~'-|(T-(Bs+x))/q] (mod q)=Bs_ (mod q) 


This is because each co-ordinate of Tx has magnitude at most ||T|| - ||x|| < 
4,/m- ,/q/4 < q, and consequently, 


[(T - (Bs + x)) /q] = |(T ; Bs) /q] = T - (Bs)/q 


where the final equality is because TB = 0 (mod q). 
Finally, if dist(z, A(B)) > \/q/4, then the last line of the procedure above 
causes the output to be L always. 


We now modify the decryption algorithm in two ways. The first of these modifi- 
cations ensures that the decryption algorithm runs in polynomial time, and the 
second is needed for our approximate SPH system. 

First, to avoid the exponential dependence of the decryption time on the mes- 
sage length, we modify the encryption scheme by letting the public key contain 
the matrix A = [B|U], where the columns of U € Zt” are the vectors 
uo, ..., Ug. The secret-key is a trapdoor for the entire matrix A (as opposed to 
just B as in the previous description). The ciphertext from the previous descrip- 
tion can then be written as 
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and decryption uses the BDDSolve procedure from Lemma] to recover the vec- 
tor (s,1,m). The crucial point is that, during key generation, the receiver can 
generate the matrix A along with an appropriate trapdoor for decryption. 

Secondly, we relax the decryption algorithm so that it finds an a € Z, anda 
message w for which a(y — (ig, w;-u;)) is “close” to A(B). This modified 
decryption algorithm correctly decrypts the ciphertexts generated by Enc (which 
corresponds to the case a = 1), but it also decrypts ciphertexts that would never 
be output by Enc. This modification to the decryption algorithm enables us to 
prove smoothness for the approximate SPH system. 


Parameters. Let n be the security parameter, and £ = n be the message length. 

The parameters of the system are a prime q = q(n, £), a positive integer m = 

m(n, £), and a Gaussian error parameter B = 8(n,£) € (0,1) that defines a 

distribution Wg. For concrete instantiations of these parameters, see Theorem [] 
We now describe the scheme: 


mx 


Key generation. Choose a matrix A € Z, een together with the trap- 

door T by running (A, T) — TrapSamp(1™,1"**+", q), where TrapSamp is as 

described in Lemma] Let the public key be A and the secret-key is T. 

Encryption. To encrypt the message w € Zi with respect to a public key as 

above, the sender chooses s — Z% uniformly at random, and an error vector 
=mn . . 

x W, . The ciphertext is 


y=A-{ 1]+x (mod g) 


Decryption. The decryption algorithm works as below. 


for a = 1 to q— 1 do 
s 
Compute | a’ | — BDDSolve(T, ay) 
w 


if a’ = a then 
output w/a and stop 
else try the next value of a 
end 


If the above fails for all a, output L 


Theorem 1. Let n,£,m,q, B be chosen such that m > 4(n + £)logg and B < 
1/(2 - m°n + w(/logn)). Then the scheme above is a CCA-secure encryption 
scheme assuming the hardness of distLWEn m,4q,8- 


5.2 An Approximate SPH System 


Fix a public key A € Z*("T)) for the system (where we write A = [BU], as 
usual), and a dictionary D = Zi. Sets X, Lm and Lm are defined in Section J] 
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(For our purposes, all vectors y € Zy are valid ciphertexts). Let r be such that 


vaq: w(r/logn) <r < e/(8-mn?- B). 


(Looking ahead, we remark that the upper bound on r will be used for correct- 
ness, and the lower bound will be used for smoothness.) 

A key for the SPH system is a k-tuple of vectors (e;,...,e,) where each 
e; — Dzm r is drawn independently from the discrete Gaussian distribution. The 
reader may want to keep in mind the inverse relationship between the parameters 
r and p: the larger the error parameter 8 in the encryption scheme, the smaller 
the discrete-Gaussian radius r (and vice versa). 


1. The projection set S = (Z3)*. For a key (e1,...,ex) € (Zi )*, the projection 


is a(e1,..., €p) = (u1,..., Ug), where u; = B7 ej. 

2. We now define the smooth projective hash function H = {Hp }kex. On input 
a key (e1,...,e,) € K and a ciphertext c = (label, y,m), the hash function 
is computed as follows. First compute 


1 
zi=el ly-v-()| E€ Zq- 


Treat z; as a number in [—(q—1)/2...(q—1)/2] and output bı... bg € {0,1}* 


where 
+ iaf 2 >00" 
3. On input a projected key (u1,..., Uk) € S, a ciphertext c = (label, y, m) 
and a witness s € Zù for the ciphertext, the hash function is computed as 
H! (c,s) = bı . . . by where 


b, — 0 if ufs <0 
oe at as 


Theorem 2. Let the parameters n,£,m,q, 3 be as in Theorem I] and r be as 
above. Then, H = {Hx}rex is an €-approximate smooth projective hash system. 


Proof. Clearly, the following procedures can all be done in polynomial time: 
(1) sampling a uniform key for the hash function (e1,...,e%) — (Dzm r)", 
(2) computing the hash function H on input the key (e;,...,e,) and a cipher- 
text c, (3) computing the projection-key a(e1,...,e,), and (4) computing the 
hash function given the projected key (ui,...,Uu,), a ciphertext c, and a witness 
s for the ciphertext c. 


Approximate correctness. We now show -approximate correctness. Consider 
any (label, y,m) € L, i.e., where y is a ciphertext produced by the encryption 
algorithm on input the message m. This means that y can be written as 


y=Bs+u-())+x TE, (1) 


where ||x|| < 8q- ymn (recall we work with truncated Gaussians). 
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We first show that for each i € [k], the values z; (computed using the key) and 
sTu; (computed using the projected key) are “close”. More precisely, we show 


that |z; — us| < ¢/2- (q/4). This follows because 
lzi — uj s| = |(e7 (Bs + x) — ujs| = |e? x], (2) 


where the first equality uses the fact that y can be written as in Equation (@, 
and the second uses the fact that u; = e? B. Now, |e?x| < |le:||-||x|| < (r/mn)- 
(Bq VTN) < €/2- 4/4. 

Each u; is statistically close to uniform, by an application of the leftover hash 

lemma; in particular, this means that sTu; € Zq is uniformly random[ Let b; 
be the it? bit of Hye,,...e,)(¢) and b, be the i bit of Hiwa) (© 5) Using 
Equation (2), we see that the probability that b; 4 b; (over the randomness 
of e;) is at most ¢/2. Thus, by a Chernoff bound, the Hamming distance between 
Hie. ese) and Hiau) (8) is at most ek with overwhelming probability. 
This shows approximate correctness. 
Smoothness. Consider any (label, y, m) € X \ L. By definition of L, this means 
that the decryption algorithm, on input (label, y,m) and any possible secret key 
sk, does not output m. In other words, the decryption algorithm outputs either 
L, or a message m’ 4 m. Define 


Z:i=y—U- A and z’ syeh) 


We will show that for every non-zero a € Z,, az is far from the A(B). More 
precisely, we will show that for every non-zero a € Z4, 


dist(az, A(B)) > /q@/4. 


An application of LemmafJthen shows that for every i € [k], the pair (e? B, e?z) 
is statistically close to the uniform distribution over Z?*?. 
Let us analyze the two cases: 


— The output of the decryption algorithm is L. In particular, this means that 
for every a € [1...q— 1], the vector az is far from A(B). 

— The output of the decryption algorithm is a message m’ Æ m. This could 
happen only if there is an a’ € Z, such that a'z’ is close to the lattice A(B). 
Suppose, for contradiction, that az is close to A(B) as well. The claim below 
shows that this cannot happen with high probability over the random choice 
of U. Thus, with high probability, az is far from A(B). 


Claim. The following event happens with negligible probability over the uni- 
formly random choice of U € a there exist numbers a,a’ € Zq, vectors 
mám E zé and a vector y € Z7" s-t. 


dist(az, A(B)) < vq/4 and dist(a’z’, A(B)) < /q/4. 


3 This holds only for s # 0. We omit consideration of this technical issue for the 
purposes of this paper. 
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Proof. Fix some a,a’ € Zq, m # w € zé and y € Z% We first observe 
that since the vectors & and & are linearly independent and U 


is uniformly random, the vectors az and a'z’ are uniformly random and 
(statistically) independent. Applying Lemma[]] we get that 


Pryezmxe(dist(az, A(B))\/9/4 and dist(a’z’, A(B)) < J4/4| 
< (q-™/2 - negl(n))® = q-™ - negl(n). 


Now, an application of union bound shows that the required probability is 
at most q?- q% - q™ -(q~™ - negl(n)), which is negligible in n. 


This completes the proof of Theorem J] 


6 A CCA-Secure Encryption Scheme Based on Lattices 


In this section we describe a CCA-secure encryption scheme, along with an 
approximate SPH system, based on the hardness of the LWE problem. The CCA- 
secure encryption scheme builds on the CPA-secure encryption scheme described 
in Section $I] and the SPH system is the same as the one from Section BJ with 
a few modifications. 


6.1 A CCA-Secure Encryption Scheme 


The encryption scheme is similar to the schemes in (which, themselves, 
are instantiations of the general construction of Rosen and Segev [24]). The main 
difference between and our scheme is the relaxed notion of decryption, 
which we already use in the CPA-secure construction in Section D.I] A formal 
description of the scheme follows. 


Parameters. Let n be the security parameter, and £ = poly(n) be the message 
length. The parameters of the system are a prime q = q(n, £), an integer m = 


m(n, £) € Z*, and a Gaussian error parameter 3 = (n, £) € (0, 1] that defines a 
distribution Wg. For concrete instantiations of these parameters, see Theorem B] 


Key generation. For i € [n] and b € {0,1}, choose 2n matrices Aj» — 
Zu") together with short bases Sip € Z™*™ for A+(Aj,y). More pre- 
cisely, let 

(Aib, Sib) — TrapSamp(1™, 191 q), 


where TrapSamp is as described in Lemma B] Output the public and secret keys 


pk = {Ai,o, Ait Jiel] and sk = {Si, Sia}. 


(Note that the receiver does not use the trapdoors for i > 1 and so the {A; b }i>1 
could, in fact, simply be chosen at random.) 
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Encryption. To encrypt the message w € zé with respect to a public key as 
above, the sender first generates a key pair (VK, SK) — SigKeyGen(1”) for a one- 
time signature scheme; let VK = VK,,..., VK, denote the bits of the verification 
key. Define the matrix Ay, as 


AivK, 
Avk = 


An,VKy 


Choose s — Z% uniformly at random, and choose an error vector x — vs. The 
ciphertext is (VK,y,a) where 


s 
y=Aw:| 1 | +x (mod q) 
w 


and o = Signs (y). 
Decryption. To decrypt a ciphertext (VK, y, c), first verify that ø is a correct 
signature on y and output L if not. Otherwise, parse y into n consecutive blocks 
Y1,---,¥n, where y; € Zy. Then, 
for a = 1 to q — 1 do 
Compute t := — BDDSolve(Tı vx, , ay) 


if a’ = a then 


if ||Aiv«, -t — ayi|| < /@/4 for all i € [n] then 
output w/a and stop 
else try the next value of a 
end 
If the above fails for all a, output L 


Theorem 3. Let n,@,m,q,( be such that m > 4(n + log? q and B < 1/(2- 
m?n-w(/logn)). Then, the scheme above is a CCA-secure encryption scheme 
assuming the hardness of distLWEnm,q,8- 


The proof of correctness is similar to that of the CPA-secure encryption scheme. 
CCA-security follows from the ideas of BOM]. As we observed, the main change 
between our encryption scheme and the one in is that the decryption 
algorithm tries to decrypt “all multiples of the ciphertext”. We defer the details 
of the proof to the full version. 


6.2 An Approximate SPH System 


Fix a public key {A; 0, Ai,1}iefnj, and a password dictionary D E Zi. The main 
difference from the presentation in Section D2] is in the definition of cipher- 
text validity: now, a labeled ciphertext (label, VK,y,o) is defined to be valid 
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if Verifyy, (label||y,o) = accept. Clearly, all honestly generated ciphertexts are 
valid and this condition can be checked in polynomial time. We define the sets 
X, Lm, and Ly, for m € D exactly as in Section J] 

As in Section [2] a hash key is a k-tuple of vectors (e1,...,e,) where each 
e; — Dzm,, is drawn independently from the discrete Gaussian distribution. 
The projection function and the hash computation are the same, except that 
here they use the matrices Byk and Uvx respectively (instead of B and U in 
Section E.2). In particular, this means that the projection function depends on 
the ciphertext (as allowed by the definition of an approximate SPH). The proof 
of the theorem below follows analogously to that of Theorem B] we defer the 
proof to the full version of this paper. 


Theorem 4. Let m > 4(n +) logq, B <1/(2-m?n-w(/logn)) and r be such 


that 
vaq: w(ylogn) <r < e/(8- mn? - p). 


Then H = {Hk}kex is an c-approrimate smooth projective hash system. 
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PSS Is Secure against Random Fault Attacks 
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Abstract. A fault attack consists in inducing hardware malfunctions in 
order to recover secrets from electronic devices. One of the most famous 
fault attack is Bellcore’s attack against RSA with CRT; it consists in 
inducing a fault modulo p but not modulo q at signature generation 
step; then by taking a gcd the attacker can recover the factorization of 
N = pq. The Bellcore attack applies to any encoding function that is 
deterministic, for example FDH. Recently, the attack was extended to 
randomized encodings based on the ISO/IEC 9796-2 signature standard. 
Extending the attack to other randomized encodings remains an open 
problem. 

In this paper, we show that the Bellcore attack cannot be applied to 
the PSS encoding; namely we show that PSS is provably secure against 
random fault attacks in the random oracle model, assuming that invert- 
ing RSA is hard. 


Keywords: Probabilistic Signature Scheme, Provable Security, Fault 
Attacks, Bellcore Attack. 


1 Introduction 


RSA [14] is still the most widely used signature scheme in practical applications. 
To sign a message m with RSA, the signer first applies an encoding function u to 
m, and then computes the signature o = u(m)t mod N. The signature is verified 
by checking that o° = u(m) mod N. For efficiency reasons RSA signatures are 
often computed using the Chinese Remainder Theorem (CRT); in this case the 
signature is first computed modulo p and q separately: 


d d 
=m modp, oq =m mod q 


and then op and og are combined by CRT to form the signature ø. 

Boneh, DeMillo and Lipton showed that RSA signatures computed with CRT 
can be vulnerable to fault attacks B]. If the attacker can induce a fault when og 
is computed while keeping the computation of op correct, one obtains: 


op =m? mod p, og $m’ mod q 
and the resulting faulty signature ø satisfies 


o° =m modp, o° +m modq. 


M. Matsui (Ed.): ASIACRYPT 2009, LNCS 5912, pp. 653 [ooe] 2009. 
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Therefore, given one faulty signature o, the attacker can recover the factorization 
of N by computing gcd(o® — m mod N,N) = p. This attack actually applies 
to any deterministic RSA encoding, e.g. Full Domain Hash (FDH) with o = 
H(m)¢ mod N. 

More generally, the attack applies to any probabilistic scheme where the ran- 
dom used to generate the signature is sent along with the signature, e.g. as in the 
Probabilistic Full Domain Hash (PFDH) encoding [6] where the signature is o||r 
with ¢ = H(m || r)? mod N. In that case, given the faulty value of ø and knowing 
r, the attacker can still factor N by computing gcd(a°—H(m || r) mod N, N) = p. 

However, if the random r is not given to the attacker along with the signature 
o then the Bellcore attack is thwarted. This is the case for signatures of the 
form ø = u(m, r)? mod N where the random r is only recovered when verifying 
the signature, as in Pss [2]. To recover r one needs a correct signature; from 
a faulty signature, the attacker cannot retrieve r nor infer u(m,r) in order to 
compute gcd(o* — u(m, r) mod N, N) = p, unless r is short enough to be guessed 
by exhaustive search. Note that obtaining another correct signature for m would 
not help the attacker since with high probability a different random r’ would be 
used to generate this signature. 

Recently, it was shown how to extend Bellcore’s attack to a large class of 
randomized RSA encoding schemes [7]. The extended attack was illustrated with 
the Iso/IEC 9796-2 standard [II]. iso/1Ec 9796-2 is originally a deterministic 
encoding scheme but often used in combination with message randomization, as 
in the EMV standard [8]. The 1so/IEC 9796-2 encoded message has the form 


u(m) = 6 Arg || m1] || H (m) || BCs 


where m = m[1] || m[2] is split into two parts. The authors of [Z] showed that if 
the randomness introduced into m[1] is not too large (e.g. less than 160 bits for 
a 2048-bit RSA modulus), then a single faulty signature allows to factor N as 
in the original Bellcore attack. The attack is based on Coppersmith’s technique 
for finding small roots of polynomial equations [5], which is based on the LLL 
algorithm [TJ]. 

However, extending the attack to other randomized RSA signatures remains 
an open problem. In particular, it is natural to ask whether the Bellcore attack 
could apply to PSS Ø], the most popular RSA-based signature scheme. In this 
paper, we show that the Bellcore attack cannot be extended to PSS; namely we 
show that PSS is provably secure against random fault attacks in the random 
oracle model, assuming that inverting RSA is hard. 

More precisely, we consider an extended model of security in which the at- 
tacker, in addition to the regular signing oracle, has access to a faulty signature 
oracle; that is, the attacker can request faulty signatures either modulo p or 
modulo q. For a faulty signature modulo q, the signer first generates the correct 
value modulo p: 

Op = (m,r)? mod p 


but generates a random gq modulo q. With CRT the signer then computes g’ 
such that o’ = op mod pando’ =o, mod q, and returns the faulty signature 
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o’ to the adversary. Our result is that PSS is still secure under this extended 
notion of security, in the random oracle model, assuming that inverting RSA is 
hard. 


2 Security Model 


We recall the definition of a signature scheme. 


Definition 1 (signature scheme). A signature scheme (Gen, Sign, Verify) is 
defined as follows: 


- The key generation algorithm Gen is a probabilistic algorithm which given 
1*, outputs a pair of matching public and private keys, (pk, sk). 

- The signing algorithm Sign takes the message M to be signed, the public key 
pk and the private key sk, and returns a signature x = Sign,,(M). The signing 
algorithm may be probabilistic. 

- The verification algorithm Verify takes a message M, a candidate sig- 
nature x’ and pk. It returns a bit Verify,,,(M,2'), equal to one if the signa- 
ture is accepted, and zero otherwise. We require that if x — Sign,,(M), then 
Verify,,(M, 2x) =1. 


In the existential unforgeability under an adaptive chosen message attack sce- 
nario, the forger can dynamically obtain signatures of messages of his choice and 
attempts to output a valid forgery. A valid forgery is a message/signature pair 
(M,x) such that Verify,,(M,x) = 1 whereas the signature of M was never 
requested by the forger. 

In the following, we consider an extended model of security in which the 
attacker, in addition to the regular signing oracle, has access to a faulty signature 
oracle; that is, the attacker can request faulty signatures either modulo p or 
modulo q. For a faulty signature modulo q, the signer first generates the correct 
value modulo p: 

Op = wm, r)? mod p 


and generates a random o, modulo q. With CRT the signer then computes o’ 
such that o’ = op mod pando’ =o, mod q, and returns the faulty signature 
o’ to the adversary. This is actually equivalent to first computing a correct 
signature o: 

o=p(m,r)? mod N 


and then generating a random u modulo q and computing the faulty signature: 
d =o+u-p mod N 


Formally, we consider the following scenario between a challenger and an at- 
tacker. Our scenario applies to any RSA based signature scheme in which a 
signature o is computed as o = u(m,r)? mod N for some (randomized) encod- 
ing function p(m,r). 
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Setup: the challenger generates an RSA modulus N = p- q, a public exponent 
e such that gcd(e,ġ(N)) = 1 and a private exponent d such that e-d = 1 
mod (N). The challenger sends (NV, e) to the adversary. 


Queries: the adversary can make regular signature queries to the challenger. In 
this case, given a message m, the challenger generates a random r and output 
the (correct) signature: 

o=p(m,r)? mod N 


Additionally, the attacker can make faulty signature queries. For every such 
query, the attacker specifies whether the fault should be modulo p or modulo q. 
For a faulty signature modulo q, the challenger first generates a random r and 
computes the correct signature: 


o = (m,r)? mod N 
Then the challenger generates a random u modulo q, and computes: 
d =o+u-p mod N 


and sends o’ to the attacker. The challenger proceeds similarly if a faulty signa- 
ture modulo p is requested. 


Forgery: eventually the attacker must output a forgery, that is a message signa- 
ture pair (m, x) such that Verify,,,(m,«z) = 1 whereas the signature of m was 
never requested by the forger, neither as a regular signature query nor in a faulty 
signature query. 


This completes the description of the attack scenario. As usual, we say that a 
signature scheme is (t, €)-secure if no adversary running in time ¢ can output a 
forgery with probability better than e. 

The PSS scheme was proven secure in the random oracle model [I], and our 
security proof with faulty signatures is also in the random oracle model. It is 
well known that a security proof in the random oracle model does not necessarily 
imply that a scheme is secure in the real world (see W). Although it is always 
better to have a security proof in the standard model, we think that it is still 
better to have a proof in the random oracle model than no proof at all. 


2.1 Why Random Faults? 


In our security model we have assumed that when a faulty signature ø’ is ob- 
tained, it has the uniform distribution modulo p (or modulo q). This could be 
seen as a very strong assumption; namely in practice the faults might have a 
completely non-random distribution. Consider for example a fault attack induc- 
ing the values of the registers to be set to zero. This gives op = 0 and recovering 
p is then straightforward: simply compute gcd(o’, N) = p. To prevent from this 
attack we could assume that when a fault occurs the value a, still has enough 
min-entropy. 

In the following we argue that 1) the random fault assumption is almost 
unavoidable if we want to obtain a security proof and 2) such assumption might 
actually be reasonable in practice. 
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Assume that a fault gives a random op mod p but with the k most significant 
bits set to 0, for some small integer k. That is, the attacker can obtain a list of 
faulty signatures oj such that the corresponding o; p = o; mod p satisfy: 


jee (1) 


for all 1 < i < n, where n is the number of faulty signatures. We show how to 
recover p, using an attack similar to [I3]. With LLL [2], the attacker computes 
a short vector (u1,...,Un) such that: 


Sel mod N 


i=1 


This implies: 


Since from (J) the O; p are small modulo p, if the u;’s are small enough, then the 


equality will hold not only modulo p but also over Z: 


This gives a vector (u1,..., Un) that is orthogonal in Z to the unknown vector 
(olp: -Oh p). It is shown in [3] that by generating sufficiently many such 
vectors, one can recover the unknown vector (04,,,...O,p) and eventually p. 

Note that this attack applies to any RSA-based signature scheme with CRT, 
not only to PSS. This attack shows it is not enough for op to have min-entropy, 
as only a few bits of entropy loss compared to the uniform distribution enable 
to recover p. Therefore, if we want to obtain a security proof, it seems necessary 
to assume that op is uniformly distributed modulo p. 

Actually the random fault assumption might be reasonable in practice. Name- 
ly to prevent probing attacks, the data being transmitted in the memory bus 
inside the micro-processor is usually encrypted. Therefore, the content of a regis- 
ter after a fault attack could still be some encrypted value, so it can be reasonable 
to model this register value as uniformly random. 


3 PSS Is Secure against Random Fault Attacks 


3.1 The PSS Scheme 


We recall the definition of the PSS scheme [2]. The scheme uses three hash 
functions h : {0,1}* > {0,1}*, gı : {0,1}** — {0,1}*° and go : {0,1}* = 
{0,1}*-ko-M1—-1 | where k, ko and kı are parameters. 

Key Generation: generate a k-bit RSA modulus N = pq, and a random ex- 
ponent e € ZN) Generate d such that e-d = 1 mod ¢(N). The public-key is 
(N,e); the private key is (N, d). 
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Fig. 1. PSS: the components of the image y = O||w||r*||g2(w) are darkened. The signa- 
ture of m is y? mod N. 


Signature generation: given a message m, do the following: 
1. r — {0,1}*° 

2. w — h(mlr) 

3. r* — gi(w) Or 

4. y = O|lw||r*||g2(w) 

5. Return o = y? mod N 


Signature Verification: given a message m and a signature g, do the following: 
1. Let y = o° mod N 

2. Parse y as 0||w||r*||y. If the parsing fails return 0. 

3. r — r* @ giw) 

4. If h(ml|r) =w and go(w) = y return 1. 

5. else return 0. 


3.2 Security Proof 


We first give an intuition of the proof. We denote by u(m,r) the PSS encoding 
scheme, that is u(m,r) = 0||w||r*||g2(w) where w = h(ml|r) and r* = gi(w) r. 

We receive as input a challenge (N,e,7) and we must output n? mod N. In 
the original PSS security proof [2], when receiving a signature query, the simulator 
generates a random a modulo N such that aë mod N can be written as 0||w||s|\¢. 
The simulator generates a random r of ko bits. Then it lets h(m,r) = w, gi(w) = 
sr and go(w) = t. Therefore we have that u(m, r) = (a° mod N). The simulator 
can then return a as a signature for m. When receiving a hash query for h(m, r), 
the simulator generates a random a modulo N such that 7 -aê can be written as 
O||w||s||t; it then proceeds as previously. In this case we have p(m,r) = (7 -aù 
mod N). Therefore a forgery for (m,r) enables to compute n? mod N. 

One can see that if there is no collision on the randoms r used for signature 
generation, and no collision on the values w, then the simulation is perfect. Then 
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given a forgery o’ for some message m’, with high probability we have that 
u(m',r’) = (n:a mod N) for some known a. Therefore from o’ = u(m!,r’)4 
mod N one can compute n? mod N as required and solve the RSA challenge. 

In our extended model of security, we must additionally simulate a faulty 
signature oracle. To do this, one could first generate as previously a random 
a modulo N such that af mod N can be written as 0||w]|s||t. The simulator 
generates a random r of ko bits. Then it lets h(M,r) = w, gi(w) = s@r and 
g2(w) = t, so that again p(m,r) = (a° mod N). Then instead of returning the 
correct signature a, the simulator could generate a random u modulo q, and 
output the faulty signature: 


a =a+u-p mod N (2) 


Obviously our simulator cannot do this, because it does not know the prime 
factors p and q. Instead we show that the distribution of a’ is statistically close 
to uniform in Zy; therefore, the simulator can simply return a random a’ € Zy. 

Since RSA is a permutation, instead of considering the distribution of a’, one 
can consider the distribution of y’ = a’ mod N. From (2) we have: 


y =y+v:p mod N 


where v is uniformly distributed modulo q and y is uniformly distributed in 
(0,2*-1|. The following lemma shows that the distribution of y’ is statistically 
close to uniform in Zy. 


Lemma 1. Let N = pq be a k-bit modulus where p and q are k/2-bit, and let y 
be a random integer such that 0 < y < 2*~!. Letv be a random integer modulo q. 
Then the distribution of y' =y+uv-p mod N is e-statistically close to uniform 
modulo N, with € = sts 


Proof. We consider a fixed a € Zy and we provide an estimate of Pr[y’ = a]. 
For this we consider the solutions of the equation: 
a=y+v:p mod N (3) 


We have that for every v € [0,q), there exists a unique y € [0, N[ which satis- 
fies the above relation. However we are only interested in the y’s in the range 
[0,2*-1|. We have that for each i € [1,q], the pair: 


(v=q—i, y=at+ip mod N) 


is a solution of (B) iff 
a+ip mod N <2*"! (4) 


25] 41 
many i values which satisfy relation @). Hence there are 5] or [>>| +1 
many solutions to congruence (B) such that y < 2*~1. Since y and v are random 


integers in the range [0,2*~') and [0, q) respectively, this gives: 


ait 1 1 oho 1 1 
; -o <Prly’ =al < Ls aaa 
| p | Qk-1 47 rly = a] < (| p | a ) Qk-1 q 


Depending on the choice of a, there are actually either Z=] or | 
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We write =] = c, which gives p- c < 2¥%71 < p- c+ p. We obtain: 
c 1 pe 1 gh! _ pe 
Pry =a] 2 m = OT N ( gra 
1 p o 
>F 1-5) (as 2*-1 < pe + p) 
1 2p pa 
> l= 2 > 
( =) (as a) 


Similarly, we have: 


This gives: 


2\ 1 2\ 1 
t= | a ee als (Tele 
( z) xo d<(1+2) 5 


for all a € [0, N). This implies that the distribution of y’ is >45-statistically 
close to uniform modulo N as q > 2*/27!, 


Lemma [J] shows that it is sufficient for our simulator to return a random a’ 
modulo N as the faulty signature. In other words, instead of first generating a 
random y € [0,2*~+), then a random v modulo q, then y’ = y + v- p and finally 
a’ = y’? mod N, the simulator can simply output a random a’ modulo N, and 
such output will be statistically indistinguishable from a faulty signature. 

However to this faulty signature a’ corresponds a correct signature a such 
that: 

a=a'—u-p mod N 


where u is randomly distributed modulo q. Equivalently letting y’ = a’° mod N 
there exists a corresponding value y with: 


y=y—v-p mod N (5) 
where v is randomly distributed modulo q such that y can be written as: 


y = Ollwl|s|t = (m,r) 


This implicitly defines h(m,r) = w, gi(w) = s@r and go(w) = t for the simulation 
of random oracles h, gı and g2. 

Since our simulator does not know p, it cannot compute y in equation @ 
and therefore our simulator does not known the corresponding values of w, s 
and t; therefore our simulator cannot answer the corresponding h queries, gı 
queries and gə queries if such queries are made by the attacker. Intuitively for 
h-queries it is sufficient that the set of r values is exponentially large; for this 
the parameter ky must be large enough. For gı and gə queries we must show 
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that the adversary has a negligible probability of querying w. This is shown in 
the following lemma: we show that given a faulty signature a’ (or equivalently 
y’ =° mod N) the distribution of w has enough variability, if the parameter 
kı is sufficiently large. This implies that w does not need to be computed, and 
therefore the factorization of N is not needed for our simulation. 


Lemma 2. Let N = pq be a k-bit modulus where p and q are k/2-bit, and let y 
be a random integer such that 0 < y < 2*-!. Let v be a random integer modulo 
q, and let y' =y+u-p mod N. Write y = Ollw||x where w is kı-bit and x is 
k— kı — 1 bits. Given y', for any w of ky-bit we have: 


Prw =w ly] S Since) 


Proof. We have that: 


ivewei= #(y,v) pairs, s.t. y =y+v:p mod N and y = Ow’ ||x 
#(y,v) pairs, s.t. y =y+v:p mod N and0<y< 2! 
For a fixed v, the value y mod N gets fixed by the relation y’ = y+u-p mod N. 
Moreover at least | $| of the possible v values give y mod N in the desired range 
between 0 and 2*~!. Hence the denominator of the above fraction can be lower 
bounded by |4]. 

We have that for a fixed y’, the value of y is fixed modulo p; hence for a fixed 


w with y = O||w"||a, the value of x is also fixed modulo p. As x is k — ky — 1- 
gk-k1-1 


bit, over Z there can be at most [ | many possible x values. Hence the 


numerator of the above fraction can be upper bounded by a). 
Hence we have, 


= eel eet 
P Sie > ] Bea +l gk hol 4 gh/2-1 8 
ee Sg EEr ei < ae 


Formally, we obtain the following theorem: 


Theorem 1. Assume that no algorithm can invert RSA in time t! with proba- 
bility better than e'. Then the signature scheme PSS[ko, kı] is (t, qn, dg, qs, qfs,€) 
secure, where 


t(k) = t'(k) — [as(k) + qg(k) + an(k) + 1] - ko - O(K*) 

e(k) = e'(k) + (qs + qfs + 1) (qs + aps + qr) 2-8 +8- qg ` Gps 27 mk) 
+ (qn + qe +qfs)* (an + qg Hqs Hara +1) 27™ 

+n“ dfs: 2™ +4 gfe a 


Here the attacker can make at most qh, qg, qs, dfs number of h queries, g queries, 
signature queries and fault signature queries respectively. 
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Proof. We use a simulator which behaves in exactly same way as in original PSS 
security proof B], in addition it answers fault queries with a uniformly random 
integer modulo N. Now if the attacker is successful against our simulator then 
we break the RSA challenge (NV, e,7) as in the original paper. 

We must show that any attacker which is successful against the original attack 
scenario will be successful against our simulator. For that, we use a sequence of 
games. We start with Gameg, which is exactly the attack scenario, which requires 
to know the factorization of N. Then we progressively modify the game, so that 
eventually knowledge of the factorization of N is not needed anymore. We denote 
by S; the event that the attacker succeeds in Game;. 


Gamep: this is the attack scenario. We answer signature queries as specified in 
the signature generation algorithm, using the private exponent d. We simulate 
the faulty signature queries by first generating a correct signature ø and then 
computing o/ =o0+u-p mod N for a random u modulo q. In the following for 
simplicity we only consider faulty signatures modulo q; faulty signatures modulo 
p are simulated in exactly the same way. 


Game: we abort if there is a collision for w at Step Blof the signature generation 
algorithm, or if the random r used during signature generation has already ap- 
peared before. We call this event A. More precisely event A, happens if one of 
the following is true: 


— The random r used in a signature oracle or faulty signature oracle query 
collides with either 1) the r used in a previous signature oracle or faulty 
signature oracle query or 2) the r used in a previous h oracle query. 

— The h function output in a signature oracle or faulty signature oracle query 
collides with either 1) the h function outputs in previous signature oracle or 
faulty signature oracle queries or 2) with a previous h oracle query output 
or 3) a previous g oracle query input. 

— The h oracle query output collides with either 1) a h function output in 
previous signature oracle or faulty signature oracle query or 2) a previous h 
oracle query output or 3) a previous g oracle query input. 


We obtain: 


Pr[Ai] < (qst+afs):(ds+4ps+9n)°2 + (qn +qs+qrs): (dn +qg tqs +qrs) 27" 
and: 
| Pr[51] — Pr[So]] < Pr[Ay] 


Games: we construct a similar simulator as in the original PSS security proof [2]; 


however to deal with faulty signature queries we continue to use the factorization 
of N. 

The simulator receives as input a challenge 7 and must output 7“ mod N. 
When receiving a signature query, the simulator generates a random a modulo 
N such that a° mod N can be written as 0||w||s||t. The simulator generates a 
random r of ko bits. Then it lets h(m,r) = w, gi(w) = sr and go(w) = t. 
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When receiving a hash query for h(m,r), the challenger generates a random 
a modulo N such that 7-a° mod N can be written as 0||w]|s||¢; it then defines 
h(m,r) = w, gi(w) = s r and go(w) = t as previously. The queries to gı and 
g2 are simulated by returning a random value for every new input. 
To simulate the faulty signature oracle, one first generates as above a random 
a modulo N such that af mod N can be written as 0||w]|s||t. The simulator 
generates a random r of ko bits. Then it lets h(m,r) = w, gi(w) = s @r and 
g2(w) = t. Then instead of returning a, the simulator generates a random u 
modulo q, and outputs: 
a =a+u-p mod N (6) 


In Gamez we abort as in Game, and additionally in the following case: while gen- 
erating a random a modulo N such that a° mod N can be written as 0||w||s||¢ 
during signature or faulty signature queries (and similarly for h(m,r) queries), 
we stop after trying ko + 1 times. This adds (qn + qs + dfs): 2~*° in the error 
term: 

| Pr[S2] — Pr[$1]| < (dn + qs + qre) -27* 


Games: we abort if the attacker makes a query for g(w) where w was used in a 
faulty signature for message m and random r, while the attacker has not made 
a query to h(m,r) before. We define this event as A3. As all the query answers 
are simulated independently, from Lemma P] this gives: 


|Pr[S3] — Pr{ Soll < PrlAs) < ay ars same 
Game,: we abort if the attacker makes a query for h(m,r) where r was used to 
generate a faulty signature with w, while the attacker has not made a query 
before to g(w). In this case the attacker’s view is independent from r, which 
gives: 

| Pr[$u] — Pr{Ss]l < an gys :27™ 


Games: we abort if the attacker makes a query for h(m,1r) where r was used to 
generate a faulty signature, or if the attacker makes a query for g(w) where w was 
used in a faulty signature. Games is the same as Game; since for a faulty signature 
m with random r and w, either the attacker starts with a h(m,r) query or it 
starts with a g(w) query. 

Pr[S5] = Pr[S4] 


Gameg: we change the way the faulty signature oracle is simulated. Instead of 
first generating a and then a’ as in equation (G), we first generate a uniformly 
random a’ and then a random u modulo q such that a® mod N can be written 
as 0||w||s||¢. From Lemma [we have: 


4 


| Pr[S6] — Pr[Ss]l < gfs ` sea 


Game7: since we do not answer the queries for h(m,r) where r was used to 
generate a faulty signature, and the queries for g(w) where w was used in a 
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faulty signature, we do not need to compute w. Therefore, we do not need to 
compute a random u modulo q such that a° mod N can be written as 0]||w||s|\¢. 
Therefore we do not need to know the factorization of N anymore, and we have: 


Pr[S7] = Px(S6] 


Finally, if the adversary outputs a forgery with probability at least € in Gameg, 
then the adversary must output a forgery with probability at least £ — | Pr[S7] — 
Pr[So]| in Gamez. As in the original PSS security proof, from this forgery we can 
solve the RSA challenge with probability at least: 


e! = e — | Pr[S7] — Pr[Sp]| —2-" 


Combining the previous inequalities, we get (@). 


4 PSS-R Is Secure against Fault Attacks 


In PSS-R or PSS with message recovery the goal is to save bandwidth such that 
the message is recoverable from the signature; hence it is not necessary to send 
the message separately. 


4.1 The PSS-R Scheme 


We recall the definition of the PSS-R scheme [2]. The scheme uses three hash 
functions h : {0,1}* > {0,1}*, gı : {0,1}** — {0,1}*° and go : {0,1}* = 
{0,1}*-ko-M1—1 | where k, ko and kı are the parameters. 


Key Generation: generate a k-bit RSA modulus N = pq, and a random ex- 
ponent e € ZN) Generate d such that e-d = 1 mod ¢(N). The public-key is 
(N,e); the private key is (N, d). 

Signature generation: given a message m, do the following: 

1. r — {0,1} * 

2. w — h(M|lr) 

3. r* — gi(w) Or 

4. m* — go(w) pm 

5. y — Ollw||r*||m* 

6. Return o = y? mod N 


Message Recovery: given a signature a, do the following: 

. Let y = o° mod N 

. Parse y as O|jw||r*|m*. If the parsing fails return REJECT. 
. r r* @gi(w) 

. m + m* @ go(w) 

. If h(m||r) = w return m. 

. else return REJECT. 


AnrkwnNnr 
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Fig. 2. PSS-R: Components of image y = O||w||r*||1/* are darkened. The signature of 
M is y? mod N. 


4.2 Security Proof 


Theorem 2. Assume that no algorithm can invert RSA in time t with probabil- 
ity better than £". Then the signature scheme PSS-R\|ko, ki] is (t, qh, qg; qs, qfs, €) 
secure, where: 


t(k) = t (k) — [qs (K) + qg(k) + an(k) + 1] - ko - O(K*) 
e(k) = e'(k) + (qs tars + 1) - (ds + dfs + qn): 27™ +8: qg qfs 


+ (qn + qs + qfs) - (qn + qg + qs + qfs +1): 27™ 
g—k/2 


27 min(k1,k/2) 


Sie Gh Qjs° 2° +4- qfs: 


Here the attacker can make at most dn, qg, qs, dfs number of h queries, g queries, 
signature queries and fault signature queries respectively. 


Proof. The proof of this theorem is very similar to that of Theorem [and hence 
is omitted. 


5 Conclusion 


We obtain from the previous theorems that unless the attacker is making more 
fault oracle queries than hash oracle queries, one gets the same security bound 
as in the original PSS proof without fault oracle. We note that in practice fault 
queries are usually more expensive than hash queries, since those hash queries 
can be made offline when a concrete hash function is used. 

In [6] a better security bound was given for PSS (without fault oracle). It was 
shown that the random size kg could be taken as small as logs qs, where qs is 
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the maximum number of signature queries; with qs = 2°° this gives ky = 30 bits. 
However with a fault oracle one cannot take such a small ko, since in this case 
the random r could be recovered by exhaustive search and the Bellcore attack 
would still apply. 

In summary. any parameters chosen according to the bounds in the original 
PSS paper [J] give the same level of security against fault attacks. One can take 
k = 1024, ko = kı = 128 as in QJ. 
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Abstract. Cache-timing attacks are a serious threat to security-critical 
software. We show that the combination of vector quantization and hid- 
den Markov model cryptanalysis is a powerful tool for automated analysis 
of cache-timing data; it can be used to recover critical algorithm state 
such as key material. We demonstrate its effectiveness by running an 
attack on the elliptic curve portion of OpenSSL (0.9.8k and under). This 
involves automated lattice attacks leading to key recovery within hours. 
We carry out the attack on live cache-timing data without simulating 
the side channel, showing these attacks are practical and realistic. 


Keywords: cache-timing attacks, side channel attacks, elliptic curve 
cryptography. 


1 Introduction 


Traditional cryptanalysis views cryptographic systems as mathematical abstrac- 
tions, which can be attacked using only the input and output data of the system. 
As opposed to attacks on the formal description of the system, side channel at- 
tacks are based on information that is gained from the physical implemen- 
tation of the system. Side channel leakages might reveal information about the 
internal state of the system and can be used in conjunction with other crypt- 
analytic techniques to break the system. Side channel attacks can be based on 
information obtained from, for example, power consumption, timings, electro- 
magnetic radiation or even sound. Active attacks in which the attacker manip- 
ulates the operation of the system by physical means are also considered side 
channel attacks. 

Our focus is on cache-timing attacks in which side channel information is 
gained by measuring cache access times; these are trace-driven attacks [B]. We 
place importance on automated analysis for processing large volumes of cache- 
timing data over many executions of a given algorithm. Hidden Markov models 
(HMMs) provide a framework, where the relationship between side channel ob- 
servations and the internal states of the system can be naturally modeled. HMMs 
for side channel analysis was previously studied by Oswald [4], and models for 
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key inference given by Karlof and Wagner [}] and Green et al. [6]. While their 
proposed models make use of an abstract side channel, we are concerned with 
concrete cache-timing data here. 

The analysis additionally makes use of Vector Quantization (VQ) for classi- 
fication. Cache-timing data is viewed as vectors that are matched to predefined 
templates, obtained by inducing the algorithm to perform in an unnatural man- 
ner. This can often easily be accomplished in software. 

Abstractly, it is reasonable to consider the analysis shown here as a form of 
template attack [7] used in power analysis of symmetric cryptographic primitive 
implementations, and more recently for asymmetric primitives [8]. Chari et al. [7] 
formalize exactly what a template is: A precise model for the noise and expected 
signal for all possible values for part of the key. Their attack is then carried out 
iteratively to recover successive parts of the key. 

It is difficult and not particularly prudent to model cache-timing attacks ac- 
cordingly. In lieu of such explicit formalization, we borrow from them in name 
and in spirit: The attacker has some device or code in their possession that they 
can give input to, program, or modify in some way that forces it to perform in 
a certain manner, while at the same time obtaining measurements from the side 
channel. 

Using the described analysis method, we carry out an attack on the elliptic 
curve portion of OpenSSL (0.9.8k). Within hours, we are able to recover the 
long-term private key in ECDSA by observing cache-timing data, signatures, and 
messages. Our attack exploits a weakness that stems from the use of a low-weight 
signed representation for scalars during point multiplication. The algorithm uses 
a precomputation table of points that are accessed during point addition steps. 
The lookups are reflected in the cache-timings, leaking critical algorithm state. 
A significant fraction of ECDSA nonce portions can be determined this way. 
Given enough such information, we are able to recover the private key using a 
lattice attack. 

The paper is structured as follows. In Sect. R| we give background on cache 
architectures and various published cache attacks. In Sect. B] we review elliptic 
curve cryptography and the implementation in OpenSSL. Section H] covers VQ 
and how to apply it effectively to cache-timing data analysis. In Sect. we 
discuss HMMs and describe how they are used in our attack, but also how they 
can be used to facilitate side channel attacks in general. We present our results 
in Sect. [E] and countermeasures briefly in Sect. K] We conclude in Sect. B} 


2 Cache Attacks 


We begin with a brief review of modern CPU cache architectures. This is followed 
by aselective literature review of cache attacks on cryptosystem implementations. 


2.1 Data Caches 


A CPU has a limited number of working registers to store data. Modern proces- 
sors are equipped with a data cache to offset the high latency of loading data 
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from main memory into these registers. When the CPU needs to access data, 
it first looks in the data cache, which is faster but with smaller capacity than 
main memory. If it finds the data in the cache, it is loaded with minimal latency 
and this is known as a cache hit; otherwise, a cache miss occurs and the latency 
is higher as the data is fetched from successive layers of caches or even main 
memory. Thus access to frequently used data has lower latency. Cache layers L1, 
L2, and L3 are commonplace, increasing with capacity and latency. We focus on 
data caches here, but processors often have an instruction cache as well. 

The cache replacement policy determines where data from main memory is 
stored in the cache. At opposite ends of the spectrum are a fully-associative cache 
and a direct mapped cache. Respectively, these allow data from a given memory 
location to be stored in any location or one location in the cache. The trade-off 
is between complexity and latency. A compromise is an N-way associative cache, 
where each location in memory can be stored in one of N different locations in 
the cache. The cache locations, or lines, then form a number of associative sets 
or congruency classes. 

We give the L1 data cache details for the two example processors under con- 
sideration here. 


Intel Atom. The L1 data cache consists of 384 lines of 64B each for a total of 
24KB. It is 6-way associative, thus the lines are divided into 64 associative 
sets. 

Intel Pentium 4. The L1 data cache consists of 128 lines of 64B each for a total 
of 8KB. It is 4-way associative, thus the lines are divided into 32 associative 
sets. 


We focus on these because they implement Intel’s HyperThreading, a form of 
Simultaneous Multithreading (SMT) that allows active execution of multiple 
threads concurrently. In a cache-timing attack scenario, this relaxes the need to 
force context switches since the threads naturally compete for shared resources 
during execution, such as the data caches. The newly-released (Nov. 2008) Intel 
i7 also features HyperThreading; it has the same number of associative sets as 
the Intel Atom. 


2.2 Published Attacks 


Percival demonstrated a cache-timing attack on OpenSSL 0.9.7c (30 Sep. 
2003) where a classical sliding window was used twice for exponentiation for two 
512-bit exponents in combination with the CRT to carry out a 1024-bit RSA 
encryption operation. Sliding window exponentiation computes (3° by sliding 
a width-w window across e with placement such that the value falling in the 
window is odd. It then uses a precomputation table 6’ for all odd 1 < i < 2”, 
accessed during multiplication steps; this lookup is reflected in the cache-timings, 
demonstrated on a Pentium 4 with HyperThreading. The sequence of squarings 
and multiplications yields significant key data: recovery of 200 bits out of each 
512-bit exponent, and [| claimed an additional 110 bits from each exponent due 
to fixed memory access patterns revealing information about the index to the 
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precomputation table and thus key data. Assuming the absence of errors, 
reasoned how this allows the RSA modulus to be factored in reasonable time. 
OpenSSL responded to the vulnerability in 0.9.7h (11 Oct. 2005) by modifying 
the exponentiation routine. 

Hlavac and Rosa used a similar approach to demonstrate a lattice attack 
on DSA signatures with known nonce portions. They estimated that after ob- 
serving 6 authentications to an OpenSSH server, which uses OpenSSL (< 0.9.7h) 
for DSA signatures, an attacker will have a high success probability when run- 
ning a lattice attack to recover the private key. They state that the side channel 
was emulated for the experiments. 

The numerous published attacks against secret key implementations are note- 
worthy. Among others, these include attacks on AES by Bernstein [L] and Osvik 
et al. [2]. Both papers present key recovery attacks on various implementations. 


3 Elliptic Curve Cryptography 


To demonstrate the effectiveness of the analysis method, we will look at one 
particular implementation of ECC. We stress that the scope of the analysis is 
much larger; this is merely one example of how it can be used. 

Given a point P on an elliptic curve and scalar k, scalar multiplication com- 
putes kP. This operation is the performance benchmark for an elliptic curve 
cryptosystem. It is normally carried out using a double-and-add approach, of 
which there are many varieties. We outline a common one later in this section. 

Our attack is demonstrated on an implementation of scalar multiplication used 
by ECDSA signature generation. A signature (r,s) on a message m is produced 
using 


r = x(kG) mod n (1) 
s=k~'(h(m) + rd) mod n (2) 


with point G of order n, nonce k chosen uniformly from [1, n), (P) the projection 
of P to its x-coordinate, h a collision-resistant hash function, and d the long-term 
private key corresponding to the public key D = dG. 


3.1 ECC in OpenSSL 


OpenSSL treats two cases of elliptic curves over binary and prime fields sepa- 
rately and implements scalar multiplication in two ways accordingly. We con- 
sider only the latter case, where a general multi-exponentiation algorithm is 
used [3M2]. The algorithm works left-to-right and uses interleaving, where one 
accumulator is used for the result and point doublings are shared; low-weight 
signed representations are used for individual scalars. 

When only one scalar is passed, as in (J) or when creating a signature using 
the OpenSSL command line tool, it reduces to a rather textbook version of scalar 
multiplication, in this case using the modified Non-Adjacent Form mNAF,, (see, 
for example, [[5]). This is reflected in the pseudocode below. OpenSSL has the 
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ability to store the precomputed points in memory, so with a fixed P such as a 
generator they need not necessarily be recomputed for each invocation. 

The representation mNAF,, is very similar to the regular windowed NAF. 
Each non-zero coefficient is followed by at least w — 1 zero coefficients, except 
for the most significant digit which is allowed to violate this condition in some 
cases to reduce the length of the representation by one while still retaining the 
same weight. Considering the MSBs of NAF,,, one applies 10Y~!d + 010”~7e 
where 6 < 0 and e = 2”! +6 when possible to obtain mNAF,,. 


Algorithm: Scalar Multiplication Algorithm: Modified NAF., 
Input: k € Z, P € E(F,), width w Input: window width w, k € Z 
Output: kP Output: mNAF,, (k) 
(ke-1... ko) —mNAFu (k) i— 0 
Precompute iP for all odd 0 <i < 2”71 while k > 1 do 
Q<ke-1P if k is odd then ki — k mods 2”, 
for i l — 2 to 0 do kek- ki 

Q — 2Q else k; — 0 

if ki # 0 then Q — Q + kiP k — k/2,i—i+1 
end end 
return Q if ki; = 1 and ki—-1-w < 0 then 


kit-a = kiia + 2 
kizi + 0, ky-2 — 1, i i—l1 
end 
return (ki—1,..., ko) 


3.2 Cache Attack Vulnerability 


Following the description of the mNAF,, representation, knowledge of the curve 
operation sequence corresponds directly to the algorithm state, yielding quite a 
lot of key data. Point additions take place when a coefficient k; 4 0 and these 
are necessarily followed by w point doublings due to the scalar representation. 
From the side channel perspective, consecutive doublings allow inference of zero 
coefficients, and more than w point doublings reveals non-trivial zero coefficients. 

Without any countermeasures, the above scalar multiplication routine is vul- 
nerable to cache-timing attacks. The points in the precomputation phase are 
stored in memory; when a point addition takes place, the point to be added is 
loaded into the cache. An attacker can detect this by concurrently running a spy 
process |9| that does nothing more than continually load its own data into the 
cache and measure the time require to read from all cache lines in a cache set, 
iterating the process for all cache sets. Fast cache access times indicate cache hits 
and the scalar multiplication routine has not aggressively accessed those cache 
locations since the last iteration, which would evict the spy process data from 
those cache locations, cause a cache miss, and thus slower cache access times for 
the spy process. 

In Fig. 1] we illustrate typical cache timing data obtained from a spy pro- 
cess running on a Pentium 4 (Top) and Atom (Bottom) with OpenSSL 0.9.8k 
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performing an ECDSA signature operation concurrently. The top eight rows of 
each graph are metadata; the lower half represents the VQ label and the upper 
half the algorithm state. We show how we obtained the metadata in Sect. Øland 
Sect. D] respectively. The remaining cells are the actual cache-timing data. Each 
cell in these figures indicates a cache set access time. Technically, time moves 
within each individual cell, then from bottom to top through all cache sets, then 
from left to right repeating the measurements. To visualize the data, it is ben- 
eficial to consider the data as vectors with length equal to the number of cache 
sets, and time simply moves left to right. 

To manually analyze such traces and determine what operations are being per- 
formed we look for (dis)similarities between neighboring vectors. These graphs 
show seven (Top) and eight (Bottom) point additions, with repeated point dou- 
blings occurring between each addition. As an attacker, we hope to find correla- 
tion between these point additions and the cache access times—which we easily 
find here. Additions in the top graph are visible at rows 13 and 24, among others; 
the bottom graph, rows 6, 7, 55, 56. The reader is encouraged to use the vector 
quantization label to help locate the point additions (black label). 
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Fig. 1. Cache-timing data from a spy process running concurrently with an OpenSSL 
0.9.8k ECDSA signature operation; 160-bit curve, mNAF4. Top: Pentium 4 timing data, 
seven point additions. Bottom: Atom timing data, eight point additions. Repeated point 
doublings occur between the additions. The top eight rows are metadata; the bottom 
half the VQ label (Sect. Æ) and top half the HMM state (Sect. J). All other cells are 
the raw timing data, viewed as column vectors from left to right with time. 
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4 Vector Quantization 


Automated analysis of cache-timing data like that shown in Fig. [is not a trivial 
task. When given just one trace, for simplistic algorithms it is sometimes possible 
to interpret the data manually. For many traces or complex algorithms this is 
not feasible. We aim to automate the process; the analysis begins with VQ. 

A vector quantizer is a map V : R” — C with C C R” where the set C = 
{c1,...,Ca} is called the codebook. A typical definition is V : v + arg mincec 
D(v,c) where D measures the n-dimensional Euclidean distance between v and 
c. One also associates a labelling L : C — £L with the codebook vectors; this can 
be as trivial as £ = {1,...,a} depending on the application. 

Here, we are particularly interested in VQ classification; input vectors are 
mapped to the closest vector in the codebook, then applied the correspond- 
ing label for that codebook entry. In this manner, input vectors with the same 
labelling share some user-defined quality and are grouped accordingly. The clas- 
sification quality depends on how well the codebook vectors approximate input 
data for their label. We elaborate on building the codebook C below. 


4.1 Learning Vector Quantization 


To learn the codebook vectors, we employ LVQ [I6]. This process begins with a 
set T = { (t1, l1), ..-, (tj, lj)} of training vectors and predetermined correspond- 
ing labels, as well as an approximation to C. This is commonly derived by taking 
the k centroids resulting from k-means clustering [L7 on all t; sharing the same 
label. LVQ in its simplest form then proceeds as follows. For each t;, l; € T if 
L(V (t;)) = l; the classification is correct and the matching codebook vector is 
pulled closer to t;; otherwise, incorrect and it is pushed away. This process is 
iterated until an acceptable error rate is achieved. 


4.2 Cache-Timing Data Templates 


We apply the above techniques to analyze cache-timing data. Taking the working 
example in Fig. [J for the Pentium 4 we have n = 32 and Atom n = 64 the 
dimension of the cache-timing data vectors; this is the number of cache sets. 
For simplicity we define £ = {D, A, E} to label vectors belonging to respective 
operations double, addition, or beginning/end. 

Next, we build the training data 7. This is somewhat simplified for an attacker 
as they can create their own private key and generate signatures to produce 
training data. Nevertheless, extracting individual vectors by hand proves quite 
tedious and error-prone. Also, if the spy process executes multiple times, there 
is no guarantee where the memory buffer for the timing data will be allocated. 
From execution to execution, the vectors will likely look quite different. 

Inspired by template attacks [7], we instead modify the software in such a way 
that it performs only a single task we would like to distinguish. For the scalar 
multiplication routine shown in Sect. B] we force the algorithm to perform only 
point doubling (addition) and collect templates to be used as training vectors 
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by running the modified algorithm concurrently with the cache spy process, 
obtaining the needed cache-timing data. This provides large amounts of training 
vectors and corresponding labels to define 7 with minimal effort. 

One might be tempted to use these vectors in their entirety for C. There are 
a few disadvantages in doing so: 


— This would cause VQ to run slower because #C would be sizable and contain 
many vectors such that L(c;) = L(c;) where D(c;,c;) is needlessly small; 
codebook redundancy in a sense. In practice we may need to analyze copious 
volumes of trace data. 

— We cannot assume the obtained cache-timing data templates are completely 
error-free; we strive to curtail the effect of such erroneous vectors. 


To circumvent these issues, we partition 7 = Urect (ti, li) : l = l} as subsets 
of all training vectors corresponding to a single label and subsequently perform 
k-means clustering on the vectors in each subset. The resulting centroids are 
then added to C. Finally, with C and 7 realized we employ LVQ to refine C. 
This allows experimentation with different values for k in k-means to arrive at 
a suitably compact C with small vector classification error rate. 

While we expect quality results from VQ classification, errors will nevertheless 
occur. Furthermore, we are still left with the task of inferring algorithm state. 
To solve this problem, we turn to hidden Markov models. 


5 Hidden Markov Models 


HMMs (see, e.g., [[8]) are a common method for modeling discrete-time stochas- 
tic processes. An HMM is a statistical model in which the system being modeled 
is assumed to behave like a probabilistic finite state machine with directly unob- 
servable state. The only way of gaining information about the process is through 
the observations that are emitted from each state. 

HMMs have been successfully used in many real life applications; for example, 
many modern speech recognition methods are based on HMMs [18]. Their usabil- 
ity is based on the ability to model physical systems and gain information about 
the hidden causes of emitted observations. Thus, it is not very surprising that 
HMMs can be employed in side channel cryptanalysis as well: the target system 
can be viewed as the hidden part of the HMM and the emitted observations as 
information leaked through the side channel. In the following sections, we give 
a formal definition of an HMM, discuss the three basic problems for HMMs and 
describe how HMMs are used in our attack. The methodology should give an 
idea of how to use HMMs in side channel attacks in general. 


5.1 Elements of an HMM 


An HMM models a discrete-time stochastic process with a finite number of pos- 
sible states. The state of the process is assumed to be directly unobservable, but 
information about it can be gained from symbols that are emitted from each 
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state. The process changes its state based on a set of transition probabilities 
that indicate the probability of moving from one state to another. An observ- 
able symbol is emitted from each process state according to a set of emission 
probabilities. An example of an HMM is illustrated in Fig. B] This HMM models 
a system with three internal states, which are denoted by circles in the figure. 
Denoted by squares are the two symbols, which can be emitted from the inter- 
nal states. The state transition probabilities and the emission probabilities are 
denoted by labeled arrows. For example, the probability of moving from state 
S2 to s3 is a23; the probability of emitting symbol v2 from state s3 is b3(2). In 
this HMM, the process always starts from sı. Generally, however, there may be 
several possible first states. The initial state distribution defines the probability 
distribution for the first state over the states of the HMM. 


a11 Q22 433 


Fig. 2. An example of an HMM 


Formally, an HMM is defined by the set of internal states, the set of observa- 
tion symbols, the transition probabilities between internal states, the emission 
probabilities for each observable, and the initial state distribution. We denote 
the set of internal states by S = {s1, s2,..., Sy} and the state at time t by w. 
Correspondingly, the set of observables is denoted by V = {v1, v2,..., vac} and 
the observation emitted at time t by o+. The set of transition probabilities is 
denoted by A = {aij}, where 


aig = Pr(wep = sjjw = 5i), 1<ij <N, 


such that el aij = 1 for all 1 < ¿ < N. Whenever aij > 0, there is a direct 
transition from state s; to state sj; otherwise, it is not possible to reach s; from s; 
in a single step. An arrow in Fig.BJdenotes a positive transition probability. Thus, 
s3 cannot be reached from sı in a single step. The set of emission probabilities 
is denoted by B = {b;(k)}, where 


bj(k) = Pr(o = vplue = 5;), LS GSN, 1<k<M. 
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The initial state distribution indicates the probability distribution for the first 
state w1. It is denoted by m = {7;}, where 


ti = Pr(w = si), 1<i<N. 


The first state of the HMM in Fig. Blis always s1, so the initial state distribution 
for this HMM is defined as mı = 1 and 7; = 0 for all i Æ 1. The three probability 
measures A, B and 7 are called the model parameters. For convenience, we will 
simply write A = (A, B,7) to indicate the complete parameter set of an HMM. 


5.2 The Three Basic Problems for HMMs 


The usefulness of HMMs is based on the ability to model relationships between 

internal states and observations. Related to this are the following three problems, 

which are commonly called the three basic problems for HMMs in literature (e.g., 

Ka): 

Problem 1. Given an observation sequence O = 0102 -: -0p and a model A = 
(A,B, r), how do we efficiently compute Pr(O|A), the probability of the 
observation sequence given the model? 

Problem 2. Given an observation sequence O = 0102:-:-or and a model A, 
what is the most likely state sequence W = w,w2::-wr that produced the 
observations? 

Problem 3. Given an observation sequence O = 0102:-- or and a model à, how 
do we adjust the model parameters À = (A, B, m) to maximize Pr(O|A)? 


We briefly review the methods used to solve these problems; the reader can refer 
to [I8] for a detailed overview. Problem 1 is sometimes called the evaluation 
problem since it is concerned with finding the probability of a given sequence O. 
This problem is solved by the forward-backward algorithm (see, e.g., [[8]), which 
is able to efficiently compute the probability Pr(O|\). Problem 2 poses a problem 
that is very relevant to our work. It is the problem of finding the most likely 
explanation for the given observation sequence. The aim is to infer the most likely 
state sequence W that has produced the given observation sequence O. There 
are other possible optimality criteria [I8], but we are interested in finding W that 
maximizes Pr(W |O, A). The problem is known as the decoding problem and it is 
efficiently solved by the Viterbi algorithm [I9]. Another relevant question is posed 
by Problem 3, which asks how to adjust the model parameters A = (A, B,7) 
to maximize the probability of the observation sequence O. Altough there is 
no known analytical method to adjust À such that Pr(O|\) is maximized, the 
Baum-Welch algorithm provides one method to locally maximize Pr(O|A). 
The process is often called training the HMM and it typically involves collecting 
a set of observation sequences from a real physical phenomenon, which are used 
in training. This problem is known as the learning problem. 


5.3 Use of HMMs in Side-Channel Attacks 


HMMs are also useful tools for side channel analysis H]. Karlof and Wagner 
and Green et al. [6] use HMMs for modeling side channel attacks. Their research 
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is concerned with slightly different problems than ours. We outline the differences 
below. 


— They only consider Problem 1 and simulate the side channel. As a result, 
Problem 3 is not relevant to their work since the artificial side channel ac- 
tually defines the model that produces the observations. Thus their model 
parameters are known a priori. This is not the case for our work; Problem 3 
is essential. 

— They assume one state transition per key digit, in which case the key can 
be inferred directly from the operation of the algorithm. In our case, the 
operation sequence does not reveal the entire key, but a significant fraction 
of the key nevertheless. We use an HMM in which the states correspond only 
to possible algorithm states. 

— They are additionally interested in derivation of the (secret) scalar k in 
scalar multiplication when the same scalar is used during several runs using 
a process called belief propagation. This is not helpful in our case, since 
(EC)DSA uses nonces. 


A practical drawback of the HMM presented by Karlof and Wagner was that a 
single observable needs to correspond to a single key digit (and internal state). 
Green et al. presented a model, where this is not required: multiple observables 
can be emitted from each state. This is a more realistic model as one system 
state may emit variable length data through the side channel. Our model allows 
this also, but it is based on a different approach. 

In the following sections, we describe the HMM used for modeling the 
OpenSSL scalar multiplication algorithm. We use this model in conjunction with 
VQ to describe the relationship between the states of the algorithm and the side 
channel observations. We also describe how to perform side channel data anal- 
ysis using VQ and the HMM. The aim is to find the most likely state sequence 
for each trace that is obtained from the side channel. The analysis process can 
be divided into two steps: 


1. The VQ codebook is created and the HMM parameters are adjusted accord- 
ing to obtained training sequences. 

2. The actual data analysis is performed. When a sequence of observations 
is obtained from the side channel, we infer the most likely (hidden) state 
sequence that has emitted these observations using VQ and the HMM. 


Since these states correspond to the internal states of the system, we are able to 
determine a good estimate of what operations have been done. This information 
allows us to recover the key. 

The following sections give a framework for performing side channel attacks 
on any system. The main requirements are that we know the specification of 
the system and have access to do experiments with it or are able to accurately 
model it. 


The HMM for Scalar Multiplication. We construct an HMM where the 
hidden part models the operation of the algorithm—in this case, scalar multipli- 
cation using the modified NAF, representation, which leaks information about 
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the algorithm state through the side channel. An illustration of this part (with- 
out the transition probabilities) is presented in Fig. B] The state set is defined 
as S = {s1,..., 8g}. Each label denotes the operation that is performed in the 
corresponding state. In addition, there are separate states to denote the system 
state preceding and following the execution of the algorithm. These states are 
denoted by sı and sg, respectively. OpenSSL uses mNAF for scalars in the case 
of the 160-bit curve order we are experimenting with, so each point addition is 
followed by at least 4 point doublings, except in the beginning or end of the pro- 
cess. The states s3,..., 56 represent these doublings. The most significant digit 
is handled by the first addition state s2. 


Q-Q+kiP 


Fig. 3. An HMM transition model for modified NAF, scalar multiplication 


As can be seen from Fig. [I] the execution of one point doubling or point addi- 
tion spans several column vectors in the trace. Hence, we should let the internal 
states emit multiple observations instead of just one. Green et al. [6| solved this 
problem by introducing an additional variable that counts the cumulative num- 
ber of emitted observables. This has the drawback of considerably expanding 
the state space. To avoid this, we solve the problem by introducing substates in 
each HMM state. One main state consists of a sequence of substates, which are 
just ordinary HMM states that always emit one observation. Thus, all previously 
introduced techniques can be used for our HMM. 

The set of observables for this HMM is V = {D, A, E}, which is the same 
set used for labeling cache-timing data vectors in Sect. Ø We assume that the 
additions emit mainly As and the doublings mainly Ds. The sı and sg states 
are assumed to emit mainly Es. These symbols are connected with side chan- 
nel observations using VQ as described in Sect. K| Each vector observation is 
labeled according to which state—A, D or E—they correspond to. When a 
new side channel observation is obtained, it can be classified as A, D or E by 
taking the label of the closest codebook vector. An example of this is shown 
in Fig. [J where the rows directly above the observations represent the quan- 
tized values. Symbols A and D are indicated using darker and lighter shades, 
respectively. 
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Training of the HMM. Training starts by setting the initial model param- 
eters. These parameters can be rough estimates, since they will be improved 
during training. To train the model, we obtain a set of sequences in the HMM 
observation domain. These sequences can be created from the side channel obser- 
vations as we know how the algorithm operates. The obtained sequences are used 
for model parameter re-estimation, which is performed using the Baum-Welch 
algorithm [Z0]. Next, we create the codebook for VQ as shown in Sect. J] 


Inference of the State Sequence. Given a set of side channel observation 
sequences from the real target system, we can infer the most likely hidden state 
sequence for each of them. The first step is to perform VQ, this is, to tag the 
observations with the label of the closest codebook vector. Thus, we get a set 
of sequences in the HMM observation domain. By applying the Viterbi algo- 
rithm [19], we finally obtain the most likely state sequence for each observation 
sequence. These state sequences are actually sequences of substates; the actual 
operation sequence can be recovered based on the transitions that are taken in 
each state sequence. An example of this is shown in Fig. []] where the upper rows 
represent the main states of the algorithm. Additions are indicated using black; 
doublings are indicated using lighter shades. For example, the first addition on 
the top trace in Fig. [is followed by five doublings. 

The state sequences obtained in this step can be used in conjunction with 
some other method to mount a key recovery attack. In the simplest case, the 
state sequence reveals the secret key directly and no other methods are needed. 
However, with mNAF, this is not the case; we discuss a few practical applications 
in the next section, as well as give our empirical results. 


6 Results 


Depending on the attack scenario and the number of traces available, there are 
at least two interesting ways to apply the analysis to the case of mNAFy and 
OpenSSL. The first assumes access to only a single or similarly small number of 
traces, while the second assumes access to a signature oracle and corresponding 
side channel information. 


Solving Discrete Logs. We consider special versions of the baby-step giant- 
step algorithm for searching restricted exponent spaces; see BI] Sect. 3.6] for a 
good overview. 

The length- mNAF,, representation has maximum weight ¢/w and average 
weight ¢/(w+1); we denote this weight as t. We assume that the analysis provides 
us with the position of non-zero coefficients, but not their explicit value or sign; 
thus each coefficient gives w — 1 bits of uncertainty. One can then construct a 
baby-step giant-step algorithm to solve the ECDLP in this restricted keyspace. 
The time and space complexity is O(2‘—1)*/2); note that this does not directly 
depend on £ (or further, the group order n). For the curve under consideration, 
this gives a worst case of O(2°°) and on average O(24°), whereas the complexity 
without any such side channel information is O(2°°). 


680 B.B. Brumley and R.M. Hakala 


Lattice Attacks. Despite this reduced complexity, an attacker cannot trivially 
carry out the attack outlined above on a normal desktop PC. Known results on 
attacking signature schemes with partial knowledge of nonces include 22)23]; the 
approach is a lattice attack. Formally, the attacker obtains tuples (ri, Si, Mi, ki) 
consisting ofasignature (2), message, and partial knowledge of the nonce k obtained 
through the timing data analysis. For our experiments, not all such tuples are useful 
in the lattice attack. Using the formalization of [22], we assume ki tells us 


ki = zi + gee Ži + 2ßi z 


with z; the only unknown on the right. Our empirical timing data analysis results 
show that the majority of errors occur when too many or few doubles are placed 
between an addition; a synchronization error in a sense. So the farther we move 
towards the MSB, the more likely it is that we have erroneous indexing aj, 3; 
and the lattice attack will likely fail. 

To mitigate this issue, we instead focus only on the LSBs. We disregard the 
upper term by setting z/’ = 0 and consider only tuples where k; indicates that 
z; = 0 and a; > 6; that is, the LSBs of k; are 000000. For k chosen uniformly, 
this should happen with the reasonable probability of 276. Our empirical results 
are in line with those of [22]: For a 160-bit group order, 41 such tuples is usually 
enough for the lattice attack to succeed in this case. 

Lattice attacks have no recourse to compensate for errors. If our analysis 
determines z; = 0 but indeed z; # 0 for some i, that instance of the lattice 
attack will fail. We thus adopt the naive strategy of taking random samples of 
size 41 from the set of tuples until the attack succeeds; an attacker can always 
check the correctness of a guess by calculating the corresponding public key and 
comparing it to the actual public key. This strategy is only feasible if the ratio 
of error-free tuples to erroneous tuples is high. 

Finally, we present the automated lattice attack results; 8K signatures with 
messages and traces were obtained in both cases. 


Pentium 4 results. The analysis yielded 122 tuples indicating z; = 0. The 
long-term private key d (2) was recovered after 1007 lattice attack iterations 
(107 correct, 15 incorrect). The analysis ran in less than an hour on a Core 
2 Duo. 

Atom results. The analysis yielded 147 tuples indicating z; = 0. We recovered 
d after a total of 37196 lattice attack iterations (115 correct, 32 incorrect). 
Our analysis is less accurate in this case, but still accurate enough to recover 
the key in only a few hours on a Core 2 Duo. 


Summary. We omit strategies for finding correlation between the traces and 
specific key digits. This can be tremendously helpful in further reducing the 
search space when trying to solve the ECDLP. As such, given only one or a few 
traces, this analysis method should be used as a tool in conjunction with other 
heuristics to trim the search space. The lattice attack given here is proof-of- 
concept. The results suggest that significantly fewer signatures are needed. In 
practice one can perform a much more intelligent lattice attack, perhaps even 
considering lattice attacks that account for key digit reuse [24]. 
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7 Countermeasures 


An implementation should not rely on any one countermeasure for side channel 
security, but rather a combination. We briefly discuss countermeasures, with an 
emphasis on preventing the specific weakness we exploited in OpenSSL. 


Scalar Blinding. One often-proposed strategy is to blind the scalar 
k from the point multiplication routine using randomization. One form is (k+ 
mn+m)P—mP with m, M small (e.g. 32-bit) and random. The calculation 
is then carried out using multi-exponentiation with interleaving. With such 
a strategy, it suffices that m is low weight—not necessarily short. 

Randomized Algorithms. Use random addition-subtraction chains instead of 
highly regular double-and-add routines. Oswald gave an example and 
a subsequent attack Ø]. Published algorithms tend to be geared towards 
hardware or resource restricted devices; see for a good review. In a 
software package like OpenSSL that normally runs on systems with abundant 
memory, one does not have to rely on simple randomized recoding and can 
build more flexible addition-subtraction chains. 

Shared Context. In OpenSSL’s ECC implementation, the results and illustra- 
tion in Fig. [suggest what is most visible in the traces is not the lookup from 
the precomputation table, but the dynamic memory for variables in the point 
addition and doubling functions. OpenSSL is equipped with a shared con- 
text pp. 106-107] responsible for allocating memory for curve and finite 
field arithmetic. Memory from this context should be served up randomly to 
prevent a clear fixed memory access pattern. 

Operation Balancing. In addition to the above shared context, coordinate 
systems and point addition formulae that are balanced in the number and 
order of operations are also useful; [BI] gives an example. 


The above countermeasures restrict to the software engineering view. Clearly op- 
erating system-level and hardware-level countermeasures are additionally possi- 
ble. We leave general countermeasures to this type of attack as an open question. 


8 Conclusion 


We summarize our contributions as follows: 


— We introduced a method for automated cache-timing data analysis, facilitat- 
ing discovery of critical algorithm state. This is the first work we are aware of 
that provides this at a framework level, e.g. not specific to one cryptosystem. 
Consequentially, it bridges the gap between cache attack vulnerabilities 
and attacks requiring partial key material 22223]. 

— We showed how to apply HMM cryptanalysis to cache-timing data; to the 
best of our knowledge, its first published application to real traces. This 
builds on existing work in the area of abstract side channel analysis using 
HMMs ØB], yet departs by tackling practical issues inherent to concrete 
side channels. 
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— We demonstrated the method is indeed practical by carrying out an attack 
on the elliptic curve portion of OpenSSL using live cache-timing data. The 
attack resulted in complete key recovery, with the analysis running in a 
matter of hours on a normal desktop PC. 


The method works by: 


1. Creating cache-timing data vector templates that reflect the algorithm’s 
cache access behavior. 

2. Using VQ to match incoming cache-timing data to these existing templates. 

3. Using the output as observation input to an HMM that accurately models 
the control flow of the algorithm. 


The setup phase, including acquiring the templates used to build the VQ code- 
book vectors and learning the HMM parameters, is the only part by definition 
requiring any manual work, and the majority of that can in fact be automated 
by simple modifications to the software under attack. This attack scenario is de- 
scribed for hardware power analysis in [/], but is perhaps even a greater practical 
threat in this case due to the inherent malleability of software. After the setup 
phase, cache-timing data analysis is fully automated and requires negligible time. 

The analysis given here is not strictly meant for attacking implementations, 
but for defending them as well. We encourage software developers to analyze 
their implementations using these methods to discover memory access patterns 
and apply appropriate countermeasures. 


Future Work 


One might think to forego the VQ step and use the cache-timing data directly as 
the sole input to the HMM. In our experience, this only complicates the model 
and hampers quality results. 

The example we gave was tailored to data caches, in particular the L1 data 
cache. Other data caches could prove equally fruitful. We also plan to apply the 
analysis method to instruction caches. 

While the attack results we gave were for one particular cryptosystem im- 
plementation, the analysis method has a much wider range of applications. We 
in fact found a similar vulnerability in the NSS library’s implementation of el- 
liptic curves. Departing from elliptic curves and public key cryptography, we 
plan to apply the analysis to an assortment of implementations, asymmetric and 
symmetric primitives alike. 

One of the more interesting planned applications is to algorithms with good 
side channel resistance properties, such as “Montgomery’s ladder”. While this 
might be an overwhelming challenge for traditional power analysis, the work 
here emphasizes the fact that cache-timing attacks are about memory access 
patterns; a fixed sequence of binary operations cannot be assumed sufficient to 
thwart cache-timing attacks. 


Acknowledgments. We thank the following people for comments and discus- 
sions: Dan Bernstein, Kimmo Jarvinen, Kaisa Nyberg, and Dan Page. 
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Abstract. Physical attacks on cryptographic implementations and de- 
vices have become crucial. In this context a recent line of research on a 
new class of side-channel attacks, called memory attacks, has received in- 
creasingly more attention. These attacks allow an adversary to measure a 
significant fraction of secret key bits directly from memory, independent 
of any computational side-channels. 

Physically Unclonable Functions (PUFs) represent a promising new 
technology that allows to store secrets in a tamper-evident and unclon- 
able manner. PUFs enjoy their security from physical structures at sub- 
micron level and are very useful primitives to protect against memory 
attacks. 

In this paper we aim at making the first step towards combining and 
binding algorithmic properties of cryptographic schemes with physical 
structure of the underlying hardware by means of PUFs. We introduce a 
new cryptographic primitive based on PUFs, which we call PUF-PRFs. 
These primitives can be used as a source of randomness like pseudoran- 
dom functions (PRF's). We construct a block cipher based on PUF-PRFs 
that allows simultaneous protection against algorithmic and physical at- 
tackers, in particular against memory attacks. While PUF-PRFs in gen- 
eral differ in some aspects from traditional PRFs, we show a concrete 
instantiation based on established SRAM technology that closes these 
gaps. 


1 Introduction 


Modern cryptography provides a variety of tools and methodologies to analyze 
and to prove the security of cryptographic schemes such as in BIEI]. These 
proofs always start from a particular setting with a well-defined adversary model 
and security notion. The vast majority of these proofs assume a black box model: 
the attacker knows all details of the used algorithms and protocols but has no 
knowledge of or access to the secrets of the participants, nor can he observe 
any internal computations. The idealized model allows one to derive security 
guarantees and gain valuable insights. 
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However, as soon as this basic assumption fails most security guarantees are 
off and a new open field of study arises. In cryptographic implementations long- 
term secret keys are typically stored by configuring a non-volatile memory such 
as ROM, EEPROM, flash, anti-fuses, poly or e-fuses into a particular state. 
Computations on these secrets are performed by driving electrical signals from 
one register to the next and transforming them using combinatorial circuits 
consisting of digital gates. Side-channel attacks pick up physically leaked key- 
dependent information from internal computations, e.g. by observing consumed 
power [27] or emitted radiation [I], making many straightforward algorithms and 
implementations insecure. It is clear that from an electronic hardware point of 
view, security is viewed differently, see e.g. BOMO4844). 

Even when no computation is performed, stored secret bits may leak. For 
instance, in [43] it was shown that data can be recovered from flash memory 
even after a number of erasures. By decapsulating the chip and using scanning 
electron microscopes or transmission electron microscopes the states of anti-fuses 
and flash can be made visible. Similarly, a typical computer memory is not erased 
when its power is turned off giving rise to so-called cold-boot attacks 22]. More 
radical approaches such as opening up an integrated circuit and probing metal 
wires or scanning non-volatile memories with advanced microscopes or lasers 
generally lead to a security breach of an algorithm, often immediately revealing 
an internally stored secret [23]. 

Given this observation, it becomes natural to investigate security models with 
the basic assumption: memory leaks information on the secret key. Consequently, 
a recently started line of work has investigated the use of new cryptographic 
primitives that are less vulnerable to leakage of key bits BBO]. These works 
establish security by adapting public-key algorithms to remain secure even after 
leaking a limited number of key bits. However, no security guarantees can be 
given when the leakage exceeds a certain threshold, e.g. when the whole non- 
volatile memory is compromised. Furthermore, they do not provide a solution 
for the traditional settings, e.g. for securing symmetric encryption schemes. 

Here we explore an alternative approach: Instead of making another attempt 
to solve the problem in an algorithmic manner, we base our solution on a new 
physical primitive. So-called Physically Unclonable Functions (PUFs) provide a 
new cryptographic primitive able to store secrets in a non-volatile but highly 
secure manner. When embedded into an integrated circuit, PUFs are able to use 
the deep submicron physical uniqueness of the device as a source of randomness 
[5.420147]. Since this randomness stems from the uncontrollable subtleties of 
the manufacturing process, rather than from hard-wired bits, it is practically 
infeasible to externally measure these values during a physical attack. Moreover, 
any attempt to open up the PUF in order to observe its internals will with 
overwhelming probability alter these variables and change or even destroy the 
PUF Ø]. 

In this paper, we take advantage of the useful properties of PUFs to build 
an encryption scheme resilient against memory leakage adversaries as defined in 
2]. We construct a block cipher that explicitly makes use of the algorithmic and 
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physical properties of PUFs to protect against physical and algorithmic attacks 
at the same time. Other protection mechanisms against physical attacks require 
either additional algorithmic effort, e.g. 243445139), on the schemes or separate 
(possibly expensive) hardware measures. 

Our encryption scheme can particularly be used for applications such as se- 
cure storage of data on untrusted storage (e.g., harddisk) where (i) no storage 
of secrets for encryption/decryption is needed and keys are only re-generated 
when needed, (ii) copying the token is infeasible (unclonability), (iii) temporary 
unauthorized access to the token will reveal data to the adversary but not the 
key, or (iv) no non-volatile memory is available. 


Contribution. Our contributions are as follows: 


A new cryptographic primitive: PUF-PRF. We place the PUFs at the core of a 
pseudorandom function (PRF) construction that meets well-defined properties. 
We provide a formal model for this new primitive that we refer to as PUF- 
PRFEs. PRFs are fundamental primitives in cryptography and have many 


applications, e.g. see [S8253]. 


A PUF-PRF-based provably secure block cipher. One problem with our PUF- 
PRF construction is that it requires some additional helper data that inevitably 
leaks some internal information. Hence, PUF-PRFs cannot serve as a direct 
replacement for PRFs. However, we present a provably secure block cipher based 
on PUF-PRFs that remains secure despite the information leakage. Furthermore, 
no secret key needs to be stored, protecting the scheme against memory leakage 
attacks. The tight integration of PUF-PRFs into the cryptographic construction 
improves the tamper-resilience of the overall design. Any attempt at accessing 
the internals of the device will result in a change of the PUF-PRF. Hence, no 
costly error detection networks or alternative anti-tampering technologies are 
needed. The unclonability and tamper-resilience properties of the underlying 
PUFs allow for elegant and cost-effective solutions to specific applications such 
as software protection or device encryption. 


An improved and practical PUF-PRF construction. Although the information 
leakage through helper data is unavoidable in the general case, the concrete case 
might allow for more efficient and secure constructions. We introduce SRAM- 
PRFs, based on so-called SRAM PUFs, which are similar to the general PUF- 
PRFs but where it can be shown that no information is leaked through the 
helper data if run in an appropriate mode of operation. Hence, SRAM-PRFs are 
in all practical views a physical realization of expanding PRFs. 


Organization. This paper is organized as follows. First, we give an overview 
of related work in Section B] In Section B] we define and justify the considered 
attacker model. In Section J] we introduce a formal model for PUFs. Based on 
this, we define in Section Dla new cryptographic primitive, termed PUF-PRFs. 
Furthermore, we present a provably secure block cipher based on PUF-PRFs 
that is secure despite the information leakage through helper data. In Section] 
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we explain for the concrete case of SRAM PUFs an improved construction that 
shares the same benefits like general PUF-PRFs but where it can be argued that 
the helper data does not leak any information. Finally, in Section [J] we present 
the conclusions. 


2 Related Work 


In recent years numerous results in the field of physical attacks emerged showing 
that the classical black box model is overly optimistic, see e.g. A243 82327. 
Due to a number of physical leakage channels, the adversary often learns (part of) 
a stored secret or is able to observe some intermediate results of the private com- 
putations. These observations give him a powerful advantage that often breaks 
the security of the entire scheme. To cope with this reality, a number of new 
theoretic adversary models were proposed, incorporating possible physical leak- 
age of this kind. Ishai et al. BA model an adversary which is able to probe, i.e. 
eavesdrop, a number of lines carrying intermediate results in a private circuit, 
and show how to create a secure primitive within this computational leakage 
model. Later, generalizations such as Physically Observable Cryptography pro- 
posed by Micali et al. BJ investigate the situation where only computation leaks 
information while assuming leak-proof secret storages. Recently, Pietrzak 
and Standaert et al. put forward some new models and constructions taking 
physical side-channel leakage into account. 

Complementary to the computation leakage attacks, another line of work 
explored memory leakage attacks: an adversary learns a fraction of a stored secret 
2536]. In B] Akavia et al introduced a more realistic model that considers the 
security against a wide class of side-channel attacks when a function of the secret 
key bits is leaked. Akavia et al further showed that Regev’s lattice-based scheme 
E] is resilient to key leakage. More recently Naor et al proposed a generic 
construction for a public-key encryption scheme that is resilient to key leakage. 
Although all these papers present strong results from a theoretical security point 
of view, they are often much too expensive to implement on an integrated circuit 
(IC), e.g. the size of private circuits in 24] blows up with O(n?) where n denotes 
the number of probings by the adversary. Moreover, almost all of these proposals 
make use of public-key crypto primitives, which introduce a significant overhead 
in systems where symmetric encryption is desired for improved efficiency. 

Besides the information leakage attacks mentioned above, another important 
field of studies are tampering attacks. Numerous countermeasures have been 
discussed, e.g., use of a protective coating layer or the application of error 
detection codes (EDCs) [25916]. Observe that limitations and benefits of tamper- 
proof hardware have likewise been theoretically investigated in a series of works 
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3 Memory Attacks 


In this work we consider an extension of memory attacks as introduced by Akavia 
et. al. [2] where the attacker can extract a bounded number of bits of a stored 
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secret. The model allows for covering a large variety of different memory attacks, 
e.g., cold boot attacks described in 22]. However, this general model might not 
adequately capture certain concrete scenarios. For example, feature sizes on ICs 
have shrunk to nanometer levels and probing such fine metal wires is even for 
high-end IC manufacturers a difficult task. During a cryptographic computation 
a secret state is (temporarily) stored in volatile memory (e.g. in registers and 
flip-flops). In a typical IC, these structures are relatively small compared to the 
rest of the circuit, making them very hard to locate and scan properly. Thus, 
applying these attacks is usually significantly physically more involved for the 
case of embedded ICs than for the non-embedded PC setting where additional 
measures to access the memory exist, e.g., through software and networks. 

On the other hand, storing long-term secrets, such as private keys, requires 
non-volatile memory, i.e. memory that sustains its state while the embedding 
device is powered off. Implementation details of such memories like ROM, EEP- 
ROM, flash, anti-fuses, poly or e-fuses and recent results on physical attacks 
such as [43] indicate that physically attacking non-volatile memory is much eas- 
ier than attacking register files or probing internal busses on recent ICs, making 
non-volatile memory effectively the weak link in many security implementations. 

Motivated by these observations, we consider the following attacker model in 
this work: 


Definition 1 (Non-volatile Memory Attacker). Let a: IN — N be a func- 
tion with a(n) < n for alln € N, and let S be a secret stored in non-volatile 
memory. A a-non-volatile memory attacker can access an oracle O that takes 
as input adaptively chosen a polynomial-size circuits h; and outputs h;(S) under 
the condition that the total number of bits that A gets as a result of oracle queries 
is at most a(|S]). 

The attacker is called a full non-volatile memory attacker if œa = id, that is 
the attacker can extract the whole content of the non-volatile memory. 


Obviously, protection against full non-volatile memory attackers is only possi- 
ble if no long-term secrets are stored within non-volatile memory. One obvious 
approach is to require a user password before each invocation. However, this 
reduces usability and is probably subject to password attacks. In this paper, 
we use another approach and make use of a physical primitive called Physi- 
cally Unclonable Function (PUF). PUFs allow to intrinsicly store permanent 
secrets which are, according to current state of knowledge, not accessible to a 
non-volatile attacker. 


4 Physically Unclonable Functions 


In this section, we introduce a formal model for Physically Unclonable Functions 
(PUFs). We start with some basic definitions. For a probability distribution D, 
the expression í x — D denotes the event that x has been sampled according to 
For a set S, 2 Č S$ means that x has been sampled uniformly random from S. For 
m > 1, we denote by Um the uniform distribution on {0,1}. The min-entropy 


690 F. Armknecht et al. 


H. (D) of a distribution D is defined by Hoo(D) $ — log, (maxs Pr[z — D)). 
Min-entropy can be viewed as the “worst-case” entropy in a random variable 
sampled according to D [O] and specifies how many nearly uniform random bits 
can be extracted from it. 

A distinguisher D is a (possibly probabilistic) algorithm that aims for distin- 
guishing between two different distributions D and D’. More precisely, D receives 


some values (which may depend on adaptively chosen inputs by D) and outputs 


a value from {0,1}. The advantage of D is defined by Adv(D) d |Pr[1 — 


D|D] — Pr|1 — D|D']|. Furthermore, we define the advantage of distinguishing 
between D and D’ as maxp Adv (D). 

In a nutshell, PUFs are physical mechanisms that accept challenges and re- 
turn responses, that is behaving like functions. The main properties of PUFs 
that are important in the context of cryptographic applications are noise (same 
challenge can lead to different (but close) responses), non-uniform distribution 
(the distribution of the responses is usually non-uniform), independence (two dif- 
ferent PUFs show completely independent behavior), unclonability (no efficient 
process is known that allows for physically cloning PUFs), and tamper evidence 
(physically tampering with a PUF will most likely destroy its physical structure, 
making it unusable, or turn it into a new PUF). We want to emphasize that the 
properties above are of a physical nature and hence are very hard to prove in the 
rigorous mathematical sense. However, they are based on experiments conducted 
worldwide and reflect the current assumptions and observations regarding PUFs, 
e.g., see [IZ]. We first provide a formal definition for noisy functions before we 
give a definition for PUFs. 


Definition 2 (Noisy functions). For three positive integers ,m,6 € N with 
0<d<™m, a(é,m,6)-noisy function f* is a probabilistic algorithm which accepts 
inputs (challenges) x € {0,1}* and generates outputs (responses) y € {0,1}™ 
such that the Hamming distance between two outputs to the same input is at 
most 6. In a similar manner, we define a (L, m, ô)-noisy family of functions to 
be a set of (,m,0)-noisy functions. 


Definition 3 (Physically Unclonable Functions). A (L, m, ð; qpuf, €puf)- 
family of PUFs P is a set of physical realizations of a family of probabilistic 
algorithms that fulfills the following algorithmic and physical properties. 


Algorithmic properties 


— Noise: P is a (L, m, ô)-noisy family of functions with 6 < # 

— Non-uniform output and independence: There exists a distribution D 
on {0,1} such that for any input x € {0,1}*, the following two distributions 
on ({0,1}™)*"! can be distinguished with advantage at most €puf. 

1. (I(x), ..., Hapus (£)) for adaptively chosen IT; € P. 

2. (Y1, ---,Yqpus) with yi — D. 
In order to have a practically useful PUF, it should be that qput © |P], €puf 
is negligible and Ha (D) > 0. 
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Physical properties 


— Unclonability: No efficient technique is known to physically clone any 
member IT €P. 

— Tamper evidence: For any PUF IT € P, any attempt to externally obtain 
its responses or parameters, e.g. by means of a physical attack, will signifi- 
cantly alter its functionality or destroy it. 


A number of constructions for PUFs have been implemented and most of them 
have been experimentally verified to meet the properties of this theoretical def- 
inition. For more details we refer to the literature, e.g. A7BO2IBTHG. One im- 
portant observation we make is that a number of PUF implementations can be 
efficiently implemented on an integrated circuit, e.g. SRAM PUFs [20]. Their 
challenge-response behavior can hence be easily integrated with a chip’s digital 
functionality. 


Remark 1. Due to their physical properties, PUFs became an interesting build- 
ing block for protecting against full non-volatile memory attackers. The basic 
idea is to use a PUF for implicitly storing a secret: instead of putting a secret 
directly into non-volatile memory, it is derived from the PUF responses during 
run time PORI]. 


5 Encrypting with PUFs: A Theoretical Construction 


In the previous section, we explained how to use PUFs for protecting any cryp- 
tographic scheme against full non-volatile memory attackers (see Remark [I. In 
the remainder of the paper, we go one step further and explore how to use PUFs 
for protecting against algorithmic attackers in addition. For this purpose, we 
discuss how to use PUFs as a source of reproducible pseudorandomness. This 
approach is motivated by the observation that certain PUFs behave to some 
extent like unpredictable functions. This will allow for constructing (somewhat 
weaker) physical instantiations of (weak) pseudorandom functions. 


5.1 PUF-(w)PRFs 


Pseudorandom functions (PRFs) are important cryptographic primitives 
with various applications (see, e.g., [[8§82983]). We recall their defininition. 


Definition 4 ((Weak) Pseudorandom Functions). Consider a family of 
functions F with input domain {0,1}" and output domain {0,1}. We say that 
F is (dprf, €prf)-pseudorandom in respect to a distribution D on {0,1}, if the 
advantage to distinguish between the following two distributions for adaptively 
chosen pairwise distinct inputs ©1,...,Uq,,¢ 18 at MOSE Eprf: 


— yi = f(x) where fËF 


— y; — D 
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F is called weakly pseudorandom if the inputs are not chosen by the distin- 
guisher, but uniformly random sampled from {0,1}* (still under the condition of 
being pairwise distinct). 

F is called (dprf, €prp)-(weakly)-pseudorandom if it is (qprf, €pr f )-(weakly)- 
pseudorandom with respect to the uniform distribution D = Um. 


Remark 2. This definition differs in several aspects slightly from the original 
definition of pseudorandom functions, e.g., BEJ. First, specifying the output 
distribution D allows for covering families of functions which have a non-uniform 
output distribution, e.g., PUFs. The original case, as stated in the definition, is 
= Um. 

Second, the requirement of pairwise distinct inputs x; has been introduced to 
deal with noisy functions where the same input can lead to different outputs. By 
disallowing multiple queries on the same input, we do not need to model the noise 
distribution, which is sometimes hard to characterize in practice. Furthermore, 
in the case of non-noisy (weak) pseudorandom functions, an attacker gains no 
advantage by querying the same input more than once. Hence, the requirement 
does not limit the attacker in the non-noisy case. 


Observe that the “non-uniform output and independence” assumption on PUFs 
(as defined in Definition B) does not automatically imply (weak) pseudoran- 
domness. The first considers the unpredictability of the response to a specific 
challenge after making queries to several different PUFs while the latter consid- 
ers the unpredictability of the response to a challenge after making queries to 
the same PUF. 

Obviously, the main obstacle is to convert noisy non-uniform inputs into re- 
liably reproducible, uniformly distributed random strings. For this purpose, we 
make use of an established tool in cryptography, i.e. fuzzy extractors (FE) [12]: 


Definition 5 (Fuzzy Extractor). A (m,n,0; ure,€rr)-fuzzy extractor E is 
a pair of randomized procedures, “generate” Gen: {0,1} — {0,1}” x {0,1}* 
and “reproduce” Rep: {0,1}™ x {0,1}* — {0,1}”. 

The correctness property guarantees that for (z,w) — Gen(y) andy’ € {0,1}™ 
with dist(y, y’) < ô, then Rep(y’,w) = z. If dist(y,y’) > ô, then no guarantee is 
provided about the output of Rep. 

The security property guarantees that for any distribution D on {0,1}™ of 
min-entropy tre, the string z is nearly uniform even for those who observe w: 


if (z,w) — Gen(D), then it holds that SD((z,w), (Un,w)) < erg. 


PUFs are most commonly used in combination with fuzzy extractor construc- 
tions based on error-correcting codes and universal hash functions. In this case, 
the helper data consists of a code-offset, which is of the same length as the PUF 
output, and the seed for the hash function, which is in the order of 100 bits and 
can often be reused for all outputs. 


Theorem 1 (Pseudorandomness of PUF-FE-composition). Let P be a 
(L, m, Ò; dpufs €pus)-family of PUFs which are (dprf, €prf)-pseudorandom with re- 
spect to some distribution D. Let E = (Gen, Rep) be an (m,n, 6; Hx(D), ere) 
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fuzzy extractor. The advantage of any distinguisher that adaptively chooses pair- 
wise distinct inputs £1,..., Eq, p and receives outputs (21,W1),---, (ZaprpsVaprs) 
to distinguish the following two distributions is at most €prf + qprf ` EFE: 


— (z;,w;) = Gen(II(2;)) where I Č P 
— (zi,wi) where zi — Un, (24,wi) = Gen(IT(a;)) and H Č P 
The analogous result holds if P is (pr, €pr)-weak-pseudorandom and if the chal- 


lenges x; are sampled uniformly random (instead of being adaptively selected), 
still under the condition of being pairwise distinct. 


Proof. We introduce an intermediate case, named case 1’, where (z;,w;) = 
? ? ? 


Gen(y;) with y; — D and H Č P. Any distinguisher between case 1 and case 
l’ can be turned into a distinguisher that distinguishes between PUF outputs 
and random samples according to D. Hence, the advantage is at most €prf by 
assumption. Furthermore, by the usual hybrid argument and the security prop- 
erty of fuzzy extractors, case 1’ and case 2 can be distinguished with advantage 
of at most qprf ` EFE- 


Definition 6 (PUF-(w)PRFs). Consider a family of (weakly)-pseudorandom 
PUFs P and a fuzzy extractor E = (Gen, Rep) (where the parameters are as de- 
scribed in Theorem[). A family of PUF-(w)PRFs is a set of pairs of randomized 
procedures, called generation and reproduction. The generation function Geno IT 
for some PUF II € P takes as input x € {0,1}* outputs (z, wx) E Gen(II(x)) € 
{0,1}"x{0,1}*, while the reproduction function Repo T takes (£, wz) € {0,1} x € 
{0,1}* as input and reproduces the value z = Rep( II (x), wz). 


Theorem [actually shows that PUF-(w)PRFs and “traditional” (w)PRFs have 
in common that (part of) the output cannot be distinguished from uniformly 
random values. One might be tempted to plug in PUF-(w)PRFs wherever PRFs 
are required. Unfortunately, things are not that simple since the information 
saved in the helper data is also needed for correct execution. It is a known fact 
that the helper data of a fuzzy extractor always leaks some information about 
the input, e.g., see [23]. Hence, extra attention must be paid when deploying 
PUF-PRFs in cryptographic schemes. In the following section, we describe an 
encryption scheme that achieves real-or-random security although the helper 
data is made public. 


5.2 A Luby-Rackoff Cipher Based on PUF-wPRFs 


A straightforward approach for using PUF-wPRFs against full non-volatile mem- 
ory attackers would be to use them for key derivation where the key is after- 
wards used in some encryption scheme. However, in this construction PUF- 
wPRFs would ensure security against non-volatile memory attackers only while 
the security of the encryption scheme would need to be shown separately. In the 
following, we present a construction that simultaneously protects against algo- 
rithmic and physical attacks while the security in both cases can be deduced to 
PUF-wPRF properties. 
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Fig. 1. A randomized 3-round Luby-Rackoff-cipher based on PUF-PRFs 


One of the most important results with respect to PRFs was developed by 
Luby and Rackoff in [B3]. They showed how to construct pseudorandom permu- 
tations from PRFs. Briefly summarized, a pseudorandom permutation (PRP) is 
a PRF that is a permutation as well. PRPs can be seen as an idealization of 
block ciphers. Consequently, the Luby-Rackoff construction is often termed as 
Luby-Rackoff cipher. 

Unfortunately, the Luby-Rackoff result does not automatically apply to the 
case of PUF-PRFs. As explained in the previous section, PUF-(w)PRFs differ 
from (w)PRFs as they additionally need some helper data for correct execution. 
First, it is unclear if and how the existence and necessity of helper data would 
fit into the established concept of PRPs. Second, an attacker might adaptively 
choose plaintexts to force internal collisions and use the information leakage of 
the helper data for checking for these events. 

Nonetheless, we can show that a Luby-Rackoff cipher based on PUF-wPRFs 
also yields a secure block cipher. For this purpose, we consider the set of concrete 
security notions for symmetric encryption schemes that has been presented and 
discussed in [4]. More precisely, we prove that a randomized version of a 3-round 
Luby-Rackoff cipher based on PUF-PRFs fulfills real-or-random indistinguisha- 
bility against a chosen-plaintext attacker. 

In a nutshell, a real-or-random attacker adaptively chooses plaintexts and 
hands them to an encryption oracle. This oracle either encrypts the received 
plaintexts (real case) or some random plaintexts (random case). The encryptions 
are given back to the attacker. Her task is to distinguish between both cases. The 
scheme is real-or-random indistinguishable if the advantage of winning the game 
is negligible (in some security parameter). Next, we first define the considered 
block cipher and prove its security afterwards. 


Definition 7 (3-round PUF-wPRE-based Luby-Rackoff cipher). Let F 
denote a family of PUF-wPRFs with input and output length n. The 3-round 
PUF-PRF-based Luby-Rackoff cipher EF uses three different PUF-wPRFs f; € 
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F, i = 1,2,3, as round functions. The working principle is very similar to the 
original Luby-Rackoff cipher and is displayed in figure} The main differences 
are twofold. First, at the beginning some uniformly random value p € {0,1} is 
chosen to randomize the right part R of the plaintext. Second, the round functions 
are PUF-wPRFs that generate two outputs: zi and wi. 

The ciphertext is (X,Y,w1,w2,w3,p). Decryption works similar to the case of 
the “traditional” Luby-Rackoff cipher where the helper data w; is used together 
with the Rep procedure for reconstructing the output zi of the PUF-PRF fi and 
the value p to “derandomize” the input to the first round function fi. 


Since there is no digital secret stored in non-volatile memory, even a full non- 
volatile memory attacker has no advantage in breaking this scheme. Although 
this makes encrypting digital communication between two different parties im- 
possible, various applications are imaginable, e.g., for encrypting data stored in 
untrusted or public storage. 


Theorem 2. Let EF be the encryption scheme defined in Definition [] using a 
family F of PUF-wPRFs (with parameters as specified in Theorem I). Then, 
the advantage of a real-or-random attacker making up to qprf queries is at most 
A€pr f ate 24pr f "EPE t+ 2. Ii, 

Proof. Let {(L®, RO) 7. 
G) (i) 


dprp denote the sequence of the adaptively chosen 


z} be the respective inputs and outputs to round function fj, 


gan 
and p“) the randomly chosen values. We show the claim by defining a sequence 
of games and estimating the advantages of distinguishing between them. Let the 
real game be the scenario that the distinguisher receives the encryptions of the 
plaintext she did choose. 

In game 1, the outputs 20 of the first round function fı are replaced by some 


uniformly random values 39) = {0,1}”". Under the assumption that the values 


plaintexts and x 


o) are pairwise distinct, the advantage to distinguish between both cases is at 
most €pr¢ + qprf ` EFE according to Theorem [I] Furthermore, as the values p 
are uniformly random, the probability of a collision in the values ol? is at most 


E- 
It, As a consequence, the advantage to distinguish between the real game and 


2 
dprf 
2n * 


Game 2 is defined like game 1 where now the inputs zf? to the first round 
function fı are randomized to gO = {0,1}”. Observe that the values oe are 
used in two different contexts: i) for computing the right part of the ciphertext 
(by XORing with the output of the second round function) and ii) as input to 
the first round function. Regarding i), observe that the outputs of the second 


round function are independent of the values a as the values z® (and hence 


the inputs to f2) are uniformly random by definition and that the values zf are 
independent of the plaintext (because of p®). Hence, i) and ii) represent two 
independent features, possibly allowing for distinguishing between game 1 and 


game 2, and hence can be examined separately. 


game 1 is upper bounded by €prf + qprf ` EFE + 
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The advantage of distinguishing between games 1 and 2 based on i) is equiv- 
alent to deciding whether the values R® © p © Y are uniformly random or 
belong to the outputs of the second round function. With the same arguments 

2 
Iprf 
an 


as above, the advantage is upper bounded by €prf + dprf ` EFE + 
The advantage of distinguishing between game 1 and game 2 based on ii) is at 
most the advantage of distinguishing (I (x DY neds , Ty (o ,)) from (Ih (2), 
a Ih (a) 7 A where J denotes the PUF used in f1. By the definition of wPRFs 
(Definition B Ø), the advantage of distinguishing (1, (a )), coi lla (ay?) from 
(Y1, -- <, Yaprş) Where yi  D and D being an appropriate dehabutiows is at most 
€prf- Actually, the same holds for (11, (#0), yad (2) ,)) (the fact that the 
values z? ) are unknown cannot increase the advantage). Hence, by the triangular 
inequality, it follows that the advantage regarding ii) is at most 2e,,,. In total, 
the advantage to distinguish between game 1 and game 2 is less than or equal 
ers 


to 3éprf + Qprf ` EFE + 

Finally, observe that i is indistinguishable whether ol or R® is randomized 
and likewise whether z® or L®. Hence, game 2 is indistinguishable from the 
random game where the plaintexts are randomized. Summing up, the advantage 


dprf 
2n 


of a real-or-random attacker is at most 4€prf + 2qprf ` EFE +2- 


6 SRAM PRFs 


In the previous section, we showed that secure cryptographic schemes are pos- 
sible even if helper data is used that leaks information. In this section, we 
show that in the concrete case, information leakage through helper data can 
be avoided completely. We illustrate this approach on SRAM PUFs that were 
originally introduced and experimentally verified in [Z0]. In respect to our mod- 
cling, an SRAM PUF is a realization of a (¢,m, 6; qpuf, €puf )-PUF that is (2°, 0)- 
pseudorandom. 

We introduce a new mode of operation that, similarly to the fuzzy extractor 
approach in the previous section, allows for extracting uniform values from SRAM 
PUFs in a reproducible way. This approach likewise stores some additional helper 
data but, as opposed to the case of fuzzy extractors, the helper data does not leak 
any information on the input. Hence, this construction might be of independent 
interest for SRAM PUF based applications. The proposed construction is based 
on two techniques: Temporal Majority Voting and Excluding Dark Bits. 

We denote the individual bits of a PUF response as y = (yo,---,;Ym-—1), with 
yi E€ {0,1}. When performing a response measurement on a PUF J, every bit 
yi of the response is determined by a Bernoulli trial. Every y; has a most likely 
aa € {0,1}, and a certain probability p; < 1/2 of differing from this 
value which we define as its bit error probability. We denote y” ) as the k-th 
measurement or sample of bit y; in a number of repeated measurements. 


Definition 8 (Temporal Majority Voting (TMV)). Consider a Bernoulli 
distributed random bit y; over {0,1}. We define temporal majority voting of yi 


value y 
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over N votes, with N an odd positive integer, as a function TMVy : {0,1}% > 
{0,1}, that takes as input N different samples of y; and outputs the most often 
occurring value in these samples. 


We can calculate the error probability py; of bit y; after TMV with N votes as: 


pu © Pr [IMV n (2f,....9{/9) | =1-Binnyp (>) < Pi, 

(1) 
with Biny,p, the cumulative distribution function of the binomial distribution. 
From Eq. (I) it follows that applying TMV to a bit of a PUF response effectively 
reduces the error probability from p; to py, with py, becoming smaller as N 
increases. We can determine the number of votes N we need to reach a certain 
threshold pr such that py; < pr, given an initial error probability p;. It turns 
out that N rises exponentially as p; gets close to 1/2. In practice, we also have to 
put a limit Nr on the number of votes we can perform, since each vote involves 
a PUF response measurement. We call the pair (Nr, pr) a TMV-threshold. 


Definition 9 (Dark Bit (DB)). Let (Nr, pr) be a TMV-threshold. We define 
a bit yi to be dark with respect to this threshold if PNr,i > Ppr- 


TMV alone cannot decrease the bit error probability to acceptable levels (e.g. 
< 107°) because of the non-negligible occurrence of dark bits. We use a bit mask 
y to identify these dark bits in the generation phase, and exclude them during 
reproduction. Similar to fuzzy extractors, (Nr,pr)-TMV and DB can be used 
for generating and reproducing uniform values from SRAM PUFs. 

The Gen-procedure takes sufficient measurements of every response bit y; 
to make an accurate estimate of its most likely value y MY) and of its error 
probability p;. If y; is dark with respect to (Nr, pr), then the corresponding bit 
yi in the bit mask y € {0,1} is set to 0 and y; is discarded, otherwise y; is 
set to 1 and y; is appended to the bit string s. The procedure Gen outputs a 
helper string w = (y,o) and an extracted string z = Extract,(s), with Extract, 
a classical strong extractor] with seed o. 

The Rep-procedure takes Nr measurements of a response y’ and the corre- 
sponding helper string w = (y, o), with y € {0,1} as input. If y; contains a 1, 


then the result of TMV yz we, nek a) is appended to a bit string s’, 


otherwise, y; is discarded. Rep outputs an extracted string z’ = Extract,(s’). 

A strong extractor [B7] is a function that is able to generate nearly-uniform 
outputs from inputs coming from a distribution with limited min-entropy. It 
ensures that the statistical distance of the extracted output to the uniform dis- 
tribution is negligible. The required compression rate of Extract, depends on 
the remaining min-entropy u of the PUF response y after the helper data is 
observed. We call the above construction a TMV-DB-SRAM-PUF. 


' See e.g. BIL for a definition of a strong extractor. Typical seed lengths of strong 
extractors are in the order of 100 bits, and in most cases the same seed can be reused 
for all outputs. 
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Using analogous arguments as in Theorem[]] one can show that the output of a 
TMV-DB-SRAM-PUF is indistinguishable from random except with negligible 
advantage. Additionally, in an SRAM PUF, the most likely value of a bit is 
independent of whether or not the bit is a dark bit, hence no min-entropy on 
the PUF output is leaked by the bit mask. However, by searching for matching 
helper strings, an adversary might still be able to find colliding TMV-DB-SRAM- 
PUF inputs (especially as the input size is small), which can impose a possible 
security leak. In order to overcome this issue, we present the following way of 
using a TMV-DB-SRAM-PUF: 


Definition 10 (All-at-once mode). Consider a TMV-DB-SRAM-PUF as de- 
scribed above. We define the all-at-once mode of operation to be the pair of pro- 
cedures (Enroll, Eval). 

The enrollment procedure Enroll outputs a helper table Q € {0, Lye when 
executed. The helper table is constructed by running Vx € {0,1}° the generation 
function (GenoIT)(a), and storing the obtained helper data wx as the x-th element 
in Q, ie. Q{r] := we. 

The evaluation function Eval: {0,1} x {0, 1}2°** — {0,1}” takes an element 
x € {0,1} and a helper table Q € {0, 1}2°** as inputs and (after internal com- 
putation) outputs a value Eval(x, Q) = z € {0,1}", with z = (Repo IT)(a, Q[a]). 


The Enroll-procedure has to be executed before the Eval-procedure, but it has 
to be run only once for every PUF. Every invocation of Eval can take the same 
(public) helper table N as one of its inputs. However, in order to conceal exactly 
which helper string is used, it is important that the Eval-procedure takes 2 as a 
whole as input, and does not just do a look-up of R[x] in a public table 2. The 
all-at-once mode prevents an adversary from learning which particular helper 
string is used during the internal computation. 


Definition 11 (SRAM-PRF). An SRAM-PRF is a TMV-DB-SRAM-PUF 


that runs in the all-at-once mode. 


Using the arguments given above we argue that SRAM-PRFs are in all prac- 
tical views a physical realization of PRFs. Observe that one major drawback 
of SRAM-PRFs is that the hardware size grows exponentially with the input 
length. Thus, SRAM-PRF's cannot be used as a concrete instantiation of PUF- 
PRFs for our construction from Section E22) This section rather shows up an 
alternative approach for constructing cryptographic mechanisms based on PUFs 
despite of the noise problem. As a possible application of SRAM-PRFs, we dis- 
cuss an expanding Luby-Rackoff cipher where the round functions are replaced 
by SRAM-PRF's that take 8-bit challenges as input and produce 120-bit ex- 
tracted outputs. According to [38], at least 48 rounds are necessary for security 
reasons. 

As an instantiation for the PUF, we take an SRAM PUF with an assumed 
average bit error probability of 15% and an estimated min-entropy content of 
0.95 bit/cell. We use TMV-threshold of (Nr = 99, pr = 107°). Simulations and 


? By consequence, also no min-entropy on the PUF input is leaked. 
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experiments on the SRAM PUF show that about 30% of the SRAM cells produce 
a dark bit with respect to this TMV-threshold. The strong extractor only has to 


compress by a factor of z4x, accounting for the limited min-entropy in the PUF 


response. Hence, z4; - 2120 bits = 5.6 kbyte of SRAM cells is needed to build 
one SRAM-PRF. Thus, the entire block cipher uses 48-5.6 kbyte ~ 271 kbyte of 
SRAM cells. The helper tables also require 5.6 kbyte each. 

Implementing 48 SRAM PUFs using a total of 271 kbyte of SRAM cells is fea- 
sible on recent ICs, and 48 rounds can be evaluated relatively fast. Storing and 
loading 48 helper tables of 5.6 kbyte each is also achievable in practice. Observe 
that the size depends linearly on the number of rounds. The according parameters 
for more rounds can be easily derived. Reducing the input size of the SRAM-PRF 
will yield an even smaller amount of needed SRAM cells and smaller helper tables, 
but the number of rounds will increase. A time-area trade-off is hence possible. 


7 Conclusions 


In this paper we propose a leakage-resilient encryption scheme that makes use 
of Physically Unclonable Functions (PUFs). The core component is a new PUF- 
based cryptographic primitive, termed PUF-PRF, that is similar to a pseudo- 
random function (PRF). We showed that PUF-PRFs possess cryptographically 
useful algorithmic and physical properties that come from the random character 
of their physical structures. 

Of course, any physical model can only approximately describe real life. Al- 
though experiments support our model for the considered PUF implementations, 
more analysis is necessary. In this context it would be interesting to consider 
other types of PUFs which fit into our model or might be used for other crypto- 
graphic applications. Furthermore, a natural continuation of this works would be 
to explore other cryptographic schemes based of PUF-PRFs, e.g., hash functions 
or public key encryption. 
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Abstract. A leakage-resilient cryptosystem remains secure even if arbi- 
trary, but bounded, information about the secret key (and possibly other 
internal state information) is leaked to an adversary. Denote the length 
of the secret key by n. We show: 


— A full-fledged signature scheme tolerating leakage of n — nf bits of 
information about the secret key (for any constant € > 0), based on 
general assumptions. 

— A one-time signature scheme, based on the minimal assumption of 
one-way functions, tolerating leakage of (4 —e)-n bits of information 
about the signer’s entire state. 

— A more efficient one-time signature scheme, that can be based on 
several specific assumptions, tolerating leakage of (4 —e)-n bits of 
information about the signer’s entire state. 


The latter two constructions extend to give leakage-resilient t-time sig- 
nature schemes. All the above constructions are in the standard model. 


1 Introduction 


Proofs of security for cryptographic primitives traditionally treat the primitive 
as a “black box” that an adversary is able to access in a relatively limited fash- 
ion. For example, in the usual model for proving security of signature schemes, 
an adversary is given the public key and allowed to request signatures on any 
messages of its choice, but is unable to get any other information about the se- 
cret key or any internal randomness or state information used during signature 
generation. 

In real-world implementations of cryptographic primitives, on the other hand, 
an adversary may be able to recover a significant amount of additional informa- 
tion not captured by standard security models. Examples include information 
leaked by side-channel cryptanalysis BOT], fault attacks [JB], or timing at- 
tacks Ø], or even bits of the secret key itself in case this key is improperly stored 
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or erased [I7]. Potentially, schemes can also be attacked when they are imple- 
mented using poor random number generation (which can be viewed as 
giving the adversary additional information on the internal state, beyond what 
would be available if the output were truly random), or when the same key is 
used in multiple contexts (e.g., for decryption and signing). 

In the past few years, cryptographers have made tremendous progress to- 
ward modeling security in the face of such information leakage [25J35], and in 
constructing leakage-resilient cryptosystems secure even in case such leakage oc- 
curs. (There has also been corresponding work on reducing unwanted leakage 
by, e.g., building tamper-proof hardware; this is not the focus of our work.) 
Most relevant to the current work is a recent series of results 
showing cryptosystems that guarantee security even when arbitrary informa- 
tion about the secret key is leaked (under suitable restrictions); we discuss this 
work, along with other related results, in further detail below. This prior work 
gives constructions of stream ciphers [IBI] (and hence stateful symmetric-key 
encryption and MACs), symmetric-key encryption schemes [9], public-key en- 
cryption schemes [IJI0[26], and signature schemes [2] achieving various notions 
of leakage resilience. 

Most prior work has focused on primitives for ensuring secrecy. The only work 
of which we are aware that deals with authenticity is that of Alwen et al. [2] which 
shows, among other results, leakage-resilient signature schemes based on number- 
theoretic assumptions in the random oracle modelf] Here we give constructions of 
leakage-resilient signature schemes based on general assumptions in the standard 
model; our main construction also tolerates more leakage than the schemes of [2]. 
(In the full version we also show some technical improvements to the results 
of 2].) We postpone a more thorough discussion of our results until after we 
define leakage resilience in more detail. 


1.1 Modeling Leakage Resilience 


At a high level, definitions of leakage resilience take the following form: Begin 
with a “standard” security notion (e.g., existential unforgeability under adaptive 
chosen message attacks [I5]) and modify this definition by allowing the adver- 
sary to (adaptively) specify a series of leakage functions f,,.... The adversary, 
in addition to getting whatever other information is specified by the original 
security definition, is given the result of applying f; to the secret key and pos- 
sibly other internal state of the honest party (e.g., the signer). We then require 
that the adversary’s success probability — for signature schemes, the probability 
with which it can output a forged signature — remain negligible. It should be 
clear that this is a general methodology that can be applied to many different 
primitives. The exact model is then determined by the restrictions placed on the 
leakage function(s) f;: 


Limited vs. arbitrary information. A first consideration regards whether 
the {fi} can be arbitrary (polynomial-time computable) functions, or whether 


1 The results of [2] were obtained independently of our own work. 
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they are restricted to be in some more limited class. Early work considered 
the latter case, for example where the adversary is restricted to learning spe- 
cific bits of the secret key [6], or the values on specific wires of the circuit 
implementing the primitive [I9]. More recent work allows 
arbitrary {fi}. 


Bounded vs. unbounded information leakage. Let n denote the length of 
the secret key. If the secret key does not change over time, and the {f;} are 
allowed to be arbitrary, then security in the traditional sense cannot be achieved 
once the total length of the leakage — that is, the outputs of all the {fi} — is 
n bits or more. For the case of signatures, the length of the leakage must also 
be less than the signature length. This inherent restriction is used in [MOBA]. 
(Alwen et al. do not impose this restriction, but as a consequence can only 
achieve a weaker notion of security.) 

One can avoid this restriction, and potentially tolerate an unbounded amount 
of leakage overall, if the secret key is updated over time; even in this case, one 
must somehow limit the amount of leakage between successive key updates. This 
approach to leakage resilience was considered in [B]] in the context of stateful 
symmetric-key primitives, and [I2] in the context of stateful signature schemes. 

One can also avoid imposing a bound on the leakage by restricting the { fi}, 
as discussed next. 


Computational min-entropy of the secret key. If the leakage is much 
shorter than the secret key (as discussed above), then the secret key will have 
high min-entropy conditioned on the leakage. This setting is considered in 
moo], and is also enforced on a per-period basis in the work of 0B] 
(i.e., the leakage per time period is required to be shorter than the secret key). 
More recent work shows schemes that remain secure for leakage of arbi- 
trary length, as long as the secret key remains exponentially hard to compute 
given the leakage (but even if the secret key is fully determined by the leakage 
in an information-theoretic sense). A drawback of this guarantee is that given 
some collection of functions {f;} (say, as determined experimentally for some 
particular set of side-channel attacks) there is no way to tell, in general, whether 
they satisfy the stated requirement or not. Furthermore, existing results in this 
direction currently require super-polynomial hardness assumptions. 


Inputs to the leakage functions. A final issue is the allowed inputs to the 
leakage functions. Work of [OBI] assumes, following 25], that only computa- 
tion leaks information; this is modeled by letting each f; take as input only 
those portions of the secret key that are accessed during the ith phase of the 
scheme. Halderman et al. [Z], however, show that memory contents can be 
leaked even when they are not being accessed. Motivated (in part) by this re- 
sult, the schemes of allow the {fi} to take the entire secret key as 
input at all times. 

For the specific primitives considered in [LITIBTTIORG, the secret key sk 
is the only internal state maintained by the party holding the secret key, and 
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so allowing the {f;} to depend on sk is (almost) the most general choicef For 
signature schemes, however, any randomness used during signing might also be 
leaked to an adversary. The strongest definition of leakage resilience is thus 
obtained by allowing the {f;} to depend on all the state information used by 
the honest signer during the course of the experiment. 

All these variants may be meaningful depending on the particular attacks one 
is trying to model. Memory attacks [[7J, which probe long-term secret infor- 
mation during a time when computation is not taking place, can be faithfully 
modeled by allowing the leakage functions to take only sk as input. On the 
other hand, side-channel attacks that collect information while computation is 
occurring might be more accurately captured by allowing the leakage functions 
to take as input only those portions of the internal state that are being accessed. 


1.2 Our Results 


With the preceding discussion in mind, we can now describe our results in further 
detail. In all cases, we allow the leakage function(s) to be arbitrary as long as 
the total leakage is bounded as some function of the secret key length n; recall 
that such a restriction on the leakage is essential if the secret key is unchanging, 
as it is in all our schemes. Our results can be summarized as follows: 


1. We show a construction of a leakage-resilient signature scheme that is exis- 
tentially unforgeable against chosen-message attacks in the standard model, 
based on general (as opposed to number-theoretic) assumptions. This scheme 
tolerates leakage of n — n* bits of information about the secret key for any 
€ > 0 based on polynomial hardness assumptions, and can tolerate (optimal) 
n —w(log n) bits of leakage based on sub-exponential hardness assumptions. 

2. We also construct two leakage-resilient one-time (resp., t-time) signature 
schemes in the standard model. These schemes are more efficient than the 
scheme above; they also tolerate leakage that may depend on the entire state 
of the signer (rather than just the secret key). 

— Our first scheme is based on the minimal assumption that one-way func- 
tions exist, and tolerates leakage of (4 — €) - n bits for any € > 0. The 
construction extends to give a t-time signature scheme tolerating leakage 
of O(n/t) bits. 

— Our second scheme, which can be based on various concrete assumptions, 
is more efficient and tolerates leakage of up to ($ — €) - n bits for any 
c > 0. This construction also extends to give a ¢-time signature scheme 
tolerating leakage of O(n/t) bits. 


In the full version of this work, we also discuss efficient constructions of full- 
fledged signature schemes based on number-theoretic assumptions (in the ran- 
dom oracle model) that are secure as long as the leakage is bounded by ($—€)-n 


? More generally, one could also allow the { f;} to depend on the randomness used to 
generate the (public and) secret key(s); this possibility is mentioned in Sec- 
tion 8.2]. (For the specific schemes considered in MBID], however, this 
makes no substantive difference.) 
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bits for any € > 0. Similar schemes were discovered independently by Alwen et 
al. B], but our analysis offers some advantages as compared to theirs. Specifi- 
cally, we make explicit the fact that the leakage can depend on the entire state 
of the signer, and we allow leakage queries to depend on the random oracle. 
Independent of our work, Faust et al. [2] describe a transformation from any 
3-time signature scheme tolerating a(n) bits of leakage to a full-fledged (but 
stateful) signature scheme where the secret key is updated over time; the result- 
ing scheme tolerates a(n) bits of leakage between key updates, and unbounded 
leakage overall. (In the transformed signature scheme, security is ensured as long 
as the leakage depends only on the active portion of the secret-key.) Applying 
this transformation to our constructions, we get full-fledged signature schemes 
that tolerate unbounded leakage (subject to the restrictions mentioned above). 


1.3 Overview of Our Techniques 


Our constructions all rely on the same basic idea. Roughly, we consider signature 
schemes with the following properties: 


— A given public key pk corresponds to a set Spk of exponentially many secret 
keys. Furthermore, given (sk, pk) with sk € Spķ it remains hard to compute 
any other sk’ € Spx. 

— The secret key sk used by the signer has high min-entropy (at least in a 
computational sense) even for an adversary who observes signatures on mes- 
sages of its choice. (For our one-time scheme, this is only required to hold 
for an adversary who observes a single signature.) 

— A signature forgery can be used to compute a secret key in Spk- 


To prove that any such signature scheme is leakage resilient, we show how to 
use an adversary A attacking the scheme to find distinct sk, sk’ € Spk given 
(sk, pk) (in violation of the assumed hardness of doing so). Given (sk, pk), we 
simply run A on input pk and respond to its signing queries using the given 
key sk. Leakage queries can also be answered using sk. If the adversary forges 
a signature, we extract some sk’ € Spk; it remains only to show that sk’ # sk 
with high probability. Let n = log |.S,,| be the (computational) min-entropy of sk 
conditioned on pk and the signatures seen by the adversary. (We assume that all 
secret keys in Spx are equally likely, which will be the case in our constructions.) 
A standard argument (cf. Lemma [I shows that if the leakage is bounded by £ 
bits, then the conditional min-entropy of the secret key is still at least n — £ — t 
bits except with probability 27%. So as long as the leakage is bounded away 
from n, with high probability the min-entropy of sk conditioned on A’s entire 
view is still at least 1. But then sk’ # sk with probability at least 1/2. This 
concludes the outline of the proof. We remark, however, that various subtleties 
arise in the formal proofs of security. 

Some existing signature schemes in the random oracle model already satisfy 
the requirements stated above. In particular, these include schemes constructed 
using the Fiat-Shamir transform [I3] applied to a witness-indistinguishable X- 
protocol where there are an exponential number of witnesses for to a given 
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statement. Concrete examples include the signature schemes of Okamoto 
(extending the Schnorr B4 and Guillou-Quisquater schemes) based on the 
discrete logarithm or RSA assumptions, as well as the signature scheme of 
Fischlin and Fischlin [4 (extending the Ong-Schnorr scheme) based on 
the hardness of factoring. This class of schemes was also considered by Alwen et 
al. [2]. See the full version of our paper for further discussion. 

We are not aware of any existing signature scheme in the standard model 
that meets our requirements. We construct one as follows. Let H be a universal 
one-way hash function (UOWHF) 27] mapping n-bit inputs to n*-bit outputs. 
The secret key of the signature scheme is x € {0,1}", and the public key is 
(y = H(x), pk,r) where pk is a public key for a CPA-secure public-key encryption 
scheme, and r is a common reference string for an unbounded simulation-sound 
NIZK proof system [B38]. A signature on a message m consists of an encryption 
C — Encp(m||x) of both m and z, along with a proof 7 that C is an encryption 
of m||a’ with H(x') = y. Observe that, with high probability over choice of x, 
there are exponentially many pre-images of y = H(a) and hence exponentially 
many valid secret keys; furthermore, finding another such secret key sk’ 4 sk 
requires finding a collision in H. Details are given in Section BJ 

Our leakage-resilient one-time signature schemes are constructed using a simi- 
lar idea. The first construction is inspired by the Lamport signature scheme [23]. 
The secret key is {(xi,0,%i,1)}#_, and the public key is {(yi,o, yi,1)}*_, where 
Yip = H (zip) for H a VOWHF. Once again, there are exponentially many se- 
cret keys associated with any public key and finding any two such keys yields 
a collision in H. Adapting the Lamport scheme, so that the signature on a 
message M = M1 ++- Mp is {aim,}*_1, yields a signature scheme secure against 
leakage of n!~€ bits. By first encoding the message using an error-correcting code 
with high minimum distance, it is possible to “boost” the leakage resilience to 
(4 — €) - n bits. Using cover-free families this approach extends also to give a 
leakage-resilient t-time signature scheme. These constructions are all described 
in Section W 

Our second construction builds on ideas that can be traced back to MBA. 
Roughly, let (G,@) and (G’,@) be groups with log|G’| < e- log|G|, and let 
H = {H, : G — G'} be a family of collision-resistant hash functions that are also 
homomorphic (i.e., for which H,(a)®H,(b) = H;(a@b)); such hash functions can 
be constructed based on a variety of concrete assumptions (see Section 3). The 
secret key is a pair of elements a,b € G, and the public-key is (s, Hs(a), Hs(b)) 
for a random key s. Note, there are exponentially many secret keys associated 
with any public key and finding any two such secret keys yields a collision in Hs. 
The signature on a message m € {1,..., ord(G)} is simply o = a@mb, which can 
be verified by checking that H,(c) 2 H,(a) ® mH,(b). The important property 
for our purposes is that given a single signature a @ mb, the secret key (a,b) 
still has high min-entropy. So if the adversary forges another signature o’ for a 
message m’ Æ m, with high probability it holds that o’ 4 a@m’b and we obtain 
a collision in H,. 


Signature Schemes with Bounded Leakage Resilience 709 
2 Definitions and Preliminaries 


We provide a formal definition of leakage resilience for signature schemes, and 
state a technical lemma that will be used in our analysis. We denote the security 
parameter by k, and let PPT stand for “probabilistic polynomial time”. 


Definition 1. A signature scheme is a tuple of PPT algorithms (Gen, Sign, Vrfy) 
such that: 


— Gen is a randomized algorithm that takes as input 1" and outputs (pk, sk), 
where pk is the public key and sk is the secret key. 

— Sign is a (possibly) randomized algorithm that takes as input the secret key 
sk, the public key pk, and a message m, and outputs a signature o. We 
denote this by o — Sign,,(m), leaving the public key implicit} 

— Vrfy is a deterministic algorithm that takes as input a public key pk, a mes- 
sage m, and a purported signature o. It outputs a bit b indicating acceptance 
or rejection, and we write this as b := Vrfy,,(m, o). 


It is required that for all k, all (pk, sk) output by Gen(1"), and all messages m 
in the message space, we have Vrfy (m, Sign,,(m)) = 1. 


Our definition of leakage resilience is the standard notion of existential unforge- 
ability under adaptive chosen-message attacks [I5], except that we addition- 
ally allow the adversary to specify arbitrary leakage functions { f;} and obtain 
the value of these functions applied to the secret key (and possibly other state 
information). 


Definition 2. Let IT = (Gen, Sign, Vrfy) be a signature scheme, and let À be a 
function. Given an adversary A, define the following experiment parameterized 
by k: 


1. Choose r — {0,1}* and compute (pk, sk) := Gen(1";r). Set state := {r}. 
2. Run A(1",pk). The adversary may then adaptively access a signing oracle 
Sign,;,(-) and a leakage oracle Leak(-) that have the following functionality: 

— In response to the ith query Sign,,(m), this oracle chooses random r; — 
{0,1}*, computes o; := Sign (Mi; ri), and returns o; to A. It also sets 
state := state U {r;}. 

— In response to the ith query Leak( fi) (where fi is specified as a circuit), 
this oracle gives f;(state) to A. (To make the definition meaningful in 
the random oracle model, the { fi} are allowed to be oracle circuits that 
depend on the random oracle H.) 

The { fi} can be arbitrary, subject to the restriction that the total output 
length of all the fi is at most X(|skl). 
3. At some point, A outputs (m,o). 


3 Usually one assumes without loss of generality that the public key is included as part 
of the secret key. Since we measure leakage as a function of the secret-key length, 
however, we seek to minimize the size of the secret key. 
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A succeeds if (1) Vrfy,,(m,o) = 1 and (2) m was not previously queried to the 


Sign,,(-) oracle. We denote the probability of this event by Prsuce, es (k)]. 


We say IT is fully \-leakage resilient if Pr[Succ) ee (k)] is negligible for every 


PPT adversary A. 

If state is not updated after each signing query (and therefore, always con- 
tains only the randomness r used to generate the secret key), we denote the 
probability of success by Pr[Suce’y 77 78*(k)] and say II is A-leakage resilient if 


Pr[Suee’y 7 (k)] is negligible for every PPT adversary A. 

Leakage resilience in the definition above corresponds to the memory attacks of 
[Tl] (except that we allow the leakage to depend also on the random coins used 
to generate the secret key). Other variations of the definition are, of course, 


also possible: state could include only sk (and not the random coins r used to 
generate it), or could include only the most recently used random coins r;. 


2.1 A Technical Lemma 


Let X be a random variable taking values in {0,1}”. The min-entropy of X is 


H(X) min {—log, Pr[X = a]}. 


xeE{0,1}” 


The conditional min-entropy of X given an event F is defined as: 


H(X |E) min {—log, Pr[X = z | EJ}. 
xE{0,1}” 
Lemma 1. Let X be a random variable with H © H(X), and fix 6 € [0, H]. 
Let f be a function whose range has size 2*, and set 


Y © {y € {0,1} | H(X |y= F(X) < H - A}. 


Then 
Pr[f(X) e Y] < 274. 


In words: the probability that knowledge of f(X) decreases the min-entropy of 
X by A or more is at most 2*~4. Put differently, the min-entropy of X after 
observing f(X) is greater than H’ except with probability at most 2ò7Ħ#+ĦH E 


Proof. Fix y in the range of f and z € {0,1}” with f(a) = y. Since 
Pr[X =z] 
Priy = f(X)]’ 


we have that y € Y only if Pr[y = f(X)] < 274. The assumption regarding the 
range of f implies |Y| < 2, and so Pr[f(X) € Y] < 2*~4 as claimed. 


Pr[X = z | y = f(x) 


Signature Schemes with Bounded Leakage Resilience 711 


3 A Leakage-Resilient Signature Scheme 


We construct a leakage-resilient signature scheme in the standard model, fol- 
lowing the intuition described in Section [L2] Let (Geng, H) be a public-coi 
UOWHF [7] mapping n-bit inputs to 4 - n*-bit outputs for n = poly(k) and 
c€ € (0,1). Let (Geng, Enc, Dec) be a CPA-secure, dens] public-key encryption 
scheme, and let (£, P, V, S1, S2) be an unbounded simulation-sound NIZK proof 
system [8] for the following language L: 


L = {(s,y,pk,m,C) : da,w s.t. C = Encp, (x; w) and H;(x) = y}. 
The signature scheme is defined as follows: 


Key generation: Choose random x < {0,1}" and compute s — Geny(1*). 
Obliviously sample a public key pk for the encryption scheme, and choose 
a random string r — {0,1}4), The public key is (s,y := H«(x),pk,r) and 
the secret key is x. 

Signing: To sign message m using secret key x and public key (s,y,pk,r), 
first choose random w and compute C := Encpg(£;w). Then compute 7 — 
P,((s,y, pk,m,C),(x,w)); i.e., m is a proof that (s,y,pk,m,C) € L using 
witness (x,w). The signature is (C,7). 

Verification: Given a signature (C,7) on the message m with respect to the 
public key (s, y, pk,r), output 1 iff V,((s, y, pk,m,C), 7) = 1. 


Theorem 1. Under the stated assumptions, the scheme above is (n—n*)-leakage 
resilient. 


Proof (Sketch). Let IZ denote the scheme given above, and let A be a PPT 


adversary with 5 = 6(k) © Pr[Succ’y '*?*8°()]. We consider a sequence of ex- 
periments, and let Pr;[-] denote the probability of an event in experiment i. We 
abbreviate Succ’, '*'*8°(k) by Succ. 


Experiment 0: This is the experiment of Definition P] Given the public key 
(s,y,pk,r) defined by the experiment, Succ denotes the event that A outputs 
(m, (C,7)) where V,((s, y, pk, m, C), 7) = 1 and m was never queried to the sign- 
ing oracle. By assumption, we have Pro[Succ] = ô. 


Experiment 1: We introduce the following differences with respect to the pre- 
ceding experiment: when setting up the public key, we now generate the common 
random string r of the simulation-sound NIZK by computing (r,r) — S)(1*). 
Furthermore, signing queries are now answered as follows: to sign m, generate 
C — Encp(x) as before but compute m as m — So((s,y, pk, m, ©), T). 


t For a public-coin UOWHF (cf. [[8]), it is hard to find a second pre-image even given 
the randomness used to generate the hash key. Standard constructions of VOWHEF's 
have this property. 

5 This means it is possible to sample a public key “obliviously,” without knowing the 
corresponding secret key. 
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It follows from the (adaptive) zero-knowledge property of (4, P, V,S1,S2), that 
the difference | Pry [Succ] — Pro[Succ]| must be negligible. 


Experiment 2: We modify the preceding experiment in the following way: to 
answer a signing query for a message m, compute C + Enc,;(0") (and then 
compute 7 as in Experiment 1). CPA-security of the encryption scheme implies 
that | Pra[Succ] — Pr;[Succ]| is negligible. 


Experiment 3: We now change the way the public key is generated. Namely, 
instead of obliviously sampling the encryption public key pk we compute it as 
(pk, sk) — Geng (1*). Note that this is only a syntactic change and so Pr3[Succ] = 
Prə[Succ]. (This assumes perfect oblivious sampling; if an obliviously generated 
public key and a legitimately generated public key are only computationally 
indistinguishable, then the probability of Succ is affected by a negligible amount.) 

Given the public key (s, y, pk, r) defined by the experiment, let Ext be the event 
that A outputs (m,(C,7)) such that the event Succ occurs and furthermore, 
H,(Decs,(C)) = y. Unbounded simulation soundness of the NIZK proof system 
implies that |Prs[Ext] — Prs[Succ]] is negligible. (Note that by definition of L the 
message m is included as part of the statement being proved, and so if A did 
not request a signature on m then it was never given a simulated proof of the 
statement (s, y, pk, m, C).) 

To complete the proof, we show that Pr3/Ext] is negligible. Consider the fol- 
lowing adversary B finding a second preimage in the VOWHF: B chooses random 
x — {0,1}” and is given key s (along with the randomness used to generate s). 
B then runs Experiment 3 with A. In this experiment all signatures given to A 
are simulated (as described in Experiment 3 above); furthermore B can easily 
answer any leakage queries made by A since 6 knows a legitimate secret key. 
(Recall that here we allow the leakage functions to be applied only to [the ran- 
domness used to generate] the secret key, but not to any auxiliary state used 
during signing.) If event Ext occurs when A terminates, then B recovers a value 
a 2 Decs,(C) for which H,(2’) = y = H,(x); i.e., B recovers such an 2’ with 
probability exactly Pr3[Ext]. We now argue that x’ Æ x with high probability. 

The only information about x revealed to A in Experiment 3 comes from the 
value y included in the public key and the leakage queries asked by A; these total 
at most $-n°+(n—n°) = n—$-n¢ bits. Using Lemma[]with A = H.(x) = n, the 
probability that H..(x | A’s view) = 0 (i.e., the probability that x is uniquely 
determined by the view of A) is at most 2-"/2 which is negligible. When the 
conditional min-entropy of x is greater than 0 there are at least two (equally 
probable) possibilities for x and so a’ 4 x with probability at least 4. Taken 
together, the probability that B recovers x’ # x with H,(x’) = H.(x) is at least 


I i (Prs [Ext] — a) 


We thus see that if Pr3[Ext] is not negligible then B violates the security of the 
UOWHF with non-negligible probability, a contradiction. 


Signature Schemes with Bounded Leakage Resilience 713 


If we are willing to rely on sub-exponential hardness assumptions, we can 
construct a UOWHF with w(log n)-bit outputs. In that case, the same signature 
scheme tolerates (optimal) leakage of n — w(log n) bits. 


4 Fully Leakage-Resilient Bounded-Use Signature 
Schemes 


In this section we describe constructions of fully leakage-resilient one-time and 
t-time signature schemes. These results are incomparable to the result of the 
previous section: on the positive side, here we achieve full leakage resilience 
(that is, where the leakage depends not only on the secret-key, but also on the 
randomness used by the signer) as well as better efficiency (and, in one case, rely 
on weaker assumptions); on the downside, the schemes given here are only secure 
when the adversary obtains a bounded number of signatures, and the leakage 
that can be tolerated is lower. 


4.1 A Construction Based on One-Way Functions 


We describe a basic one-time signature scheme, and then present an extension 
that tolerates leakage of up to a constant fraction of the secret key length. Let 
(Geng, H) be a UOWHF mapping k*-bit inputs to k-bit outputs for some c > 1. 
(As before, we assume that H is a public-coin UOWHF, i.e., it is secure even 
given the randomness used to generate the hash key.) Our basic scheme is a 
variant on Lamport’s signature scheme [23], using H as the one-way function: 


Key generation: Choose random 2j9,%i,1 <— {0,1}* for i = 1,...,k, and 
generate s — Geny(1*). Compute yi» := Hs(xi») for i € {1,...,k} and 
b € {0,1}. The public key is (s, {y:i b}) and the secret key is {£; b}. 

Signing: The signature on a k-bit message m = m1---m, consists of the k 
values Tir myse- Uhm 

Verification: Given a signature z1,..., £p on the k-bit message m = M1 ++ Mk 


with respect to the public key (s, {yib }), output 1 iff Yim; = H, (x;) for all i. 


It can be shown that the above scheme is fully n¢—)/(¢+1)-leakage resilient (as 
a one-time signature scheme), where n = 2k°t! denotes the length of the secret 
key. Setting c appropriately, the above approach thus tolerates leakage n!~° 
for any desired € > 0. (We omit the proof, since we will prove security for an 
improved scheme below.) The bound on the leakage is essentially tight, since 
an adversary who obtains the signature on the message 0% and then leaks the 
value 21,1 (which is only k€ = (n/2)°/(¢+") bits) can forge a signature on the 
message 10*~!. 


Tolerating leakage linear in the secret key length. An extension of the 
above scheme allows us to tolerate greater leakage. Specifically, we apply Lam- 
port’s scheme to a high-distance encoding of the message. Details follow. 

If A is a k x l matrix over {0,1} (viewed as the field F2), then A defines a 
(linear) error-correcting code C C {0,1}* where the message m € {0,1}* (viewed 
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as a row vector) is mapped to the codeword m- A. It is well known that for every 
e > 0 there exists a constant R such that choosing A € {0,1}**** uniformly 
at random defines a code with relative minimum distance i €, except with 
probability negligible in k. (We will not need efficient decodability.) 

Fix a constant € € (0,1) and let R be as above; set l = Rk. Let (Genz, H) 
be a UOWHF mapping @;,-bit inputs to k-bit outputs where fin = 2k/¢. The 
signature scheme is defined as: 


Sa generation: Choose random A € {0,1}*** and zio, %1 — {0,1} for 


= 1,...,é Generate s — Geny(1*). Compute yi» := Hs(xib) for i € 
a1. ree B and b € {0,1}. The public key is (A, s, {y;,,}) and the secret key 
is {zip}. 
Signing: To sign a message m € {0,1}*, first compute m = m- A € {0,1}. 
The signature then consists of the £ values £1 m1,- --, LL me 
Verification: Given a signature 71,...,2¢ on the message m with respect to 
the public key (A, s, {yi b}), first compute Mm = m - A and then output 1 iff 


7 
Yim; = Hs(a;) for all i. 


Theorem 2. If H is a VOWHF then the scheme above is a one-time signature 
scheme that is fully (4 —e)-n-leakage resilient, where n = l - Lin denotes the 
length of the secret key. 


Proof. Let IT eee the scheme given above, and let A be a PPT adversary 


with 6 = ô(k E = Sf Pr[Succ® le leakage’ (,)]. We construct an adversary B breaking 
the security of H with pr ebability at least (ô —negl(k))/42, implying that 6 must 
be negligible. 

B chooses random A € {0,1}** and zio, xi1 — {0,1}* for i=1,..., 4 we 
let X = {x;y} denote the set of secret key values B chooses and observe that 
H(X) = 2l- lin. Next, B selects a random b* € {0,1} and a random index 
i* € {1,...,¢}, and outputs 2; b»; it is given in return a hash key s. Then B 
computes yib := H(z; ») for all i,b and gives the public key (A, s, {yi b}) to A. 

B answers the signing and leakage queries of A using the secret key {2;,y} 
that it knows. Since this secret key is distributed identically to the secret key of 
an honest signer, the simulation for A is perfect and A outputs a forgery with 
probability ô. 

Let m denote the encoding of the message m whose signature was requested 
by A. The information A has about the secret-key ¥ consists of: (1) the signature 
(£1,m1»-- -Xe me) it obtained; (2) the values {y;,1-m,}4-, from the public key 
and (3) the answers to the leakage queries asked by A. Together, these total 
L bin tbk+ (4-6) -20-Lin bits. By Lemma] it follows that H,.(¥ | A’s view) > 
(4 + €) - L- Lin except with probability at most 


QE lin Hk + (5 —2e)l-bin)— 26 bin +(3 +e) Llin — glk- blin 
which is negligible. 
Assuming H.(¥ | A’s view) > (4 + €) - L- Lin, there is no set I C [4 with 


|I| > (4 — €) - £ for which the values {x;,1-m,}ier are all fixed given A’s view. 
To see this, assume the contrary. Then 
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Hy (X | A’s view) < XC Hoo(ti,1—m 
iG 

in contradiction to the assumed bound on the conditional min-entropy of æ. 

Let (m*, (aj,...,v7)) denote the forgery output by A, and let m* = m*-A 
denote the encoding of m*. Let I be the set of indices where Mm and m* differ; 
with all but negligible probability over choice of the matrix A it holds that 
|I| > (4 — €) - 2 and so we assume this to be the case. By the argument of the 
previous paragraph, it cannot be the case that the {x,1-m,}ier are all fixed 
given A’s view. But then with probability at least half we have £} 4 i,m» for 
at least one index i € J. Assuming this to be the case, with probability at least 
1/2£ this difference occurs at the index (i*,b*) guessed at the outset by 6; when 
this happens 6 has found a collision in H for the given hash key s. Putting 
everything together, we see that 6 finds a collision in H with probability at 
least (6 — negl(k)) - 4- 4, as claimed. 
A t-time signature scheme. The idea above can be further extended to give 
a fully leakage resilient t-time signature scheme using cover-free families. We 
follow the definition of [22]. 


Definition 3. A family of non-empty sets S = {91,..., SN}, where Sı C U, is 
(t, $)-cover-free if for all distinct S, S1, ..., S4 € S we have Is \ (ve S;| > |S|/2. 


A’s view) < (5 + e) L- Lin, 


Porat and Rothschild [82] show an explicit construction that, for any t and k, 

yields a (t, 4)-cover free family S = {S1,..., Sy} where the number of sets 

is N = 92(2*), the size of each set is |S;| = O(kt), and the universe size is 
|U| = O(kt?). If we let f : {0,1}* — S denote an injective map, we obtain the 
following scheme: 

Key generation: Set = O(kt?) and lin = 8tk. Choose x; — {0,1} for i= 
1,...,¢. Generate s — Geny(1*), and compute y;:=H(a;) for i€ {1,..., 0}. 
The public key is (s, {y;}4_,) and the secret key is {x;}{_,. 

Signing: To sign a message m € {0,1}*, first compute f(m) = Sm € S. The 
signature then consists of {x;}ies,,- 

Verification: Given a signature {x;} on the message m with respect to the 
public key (s, {y;}), first compute Sm = f(m) and then output 1 iff y; = 

H,(x;) for alli € Sm. 


A proof of the following proceeds along exactly the same lines as the proof of 
Theorem ØB} 


Theorem 3. If H is a UOWHF then the scheme above is a t-time signature 
scheme that is fully O(n/t)-leakage resilient, where n = £L- lin denotes the length 
of the secret key. 


4.2 A Construction from Homomorphic Collision-Resistant Hashing 


Our second construction of fully leakage-resilient bounded-use signature schemes 
relies on homomorphic collision-resistant hash functions, defined below. In Sec- 
tion 3] we describe efficient instantiations of the hash functions we need based 
on several concrete assumptions. 
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We concentrate on the case of one-time signatures, and defer a treatment of 
t-time signatures to the full version. 


Definition 4. Fix e € (0,1). A pair of PPT algorithms (Geng, H) is an e- 
homomorphic collision-resistant hash function family (e-hCRHF) if: 


1. Geny(1") outputs a key s that specifies groups (G, ®), (G', 9) (written addi- 
tively), and two sets S,T CG such that 
— log |S| = w(log k) and log |G’| < €-log|S| and log |T| < (1 + €) log |S]. 
— S is efficiently sampleable, and elements of S can be represented using 
log |S| + O(1) bits. 
— T is efficiently recognizable, and {x+my|2,y€S,0<m<2*} CT. 
2. The key s defines a function H, : G —> G" with H,(x @ y) = H,(x) 8 H;(y) 
for all x,y E€ G. 
3. There exists a constant c (independent of k) for which the following holds. 
For any s, any m,m’ with 0 < m < m’ < 2", and any 0,0’: 


{x,y € S | Hs(£ + my) = o A Hs(£ + m'y) = o'}| < 2°. 


4. No PPT algorithm A can find two elements x,y € T such that H,(x) = Hs(y). 
Namely, the following is negligible for all PPT A: 


Pris — Geng (1%); (x,y) — A(s) : x,y E€ Tk A£ AYA He (x) = He(y)]. 


If the above holds even when A is given the randomness used to generate s, 
then (Geng, H) is a strong e-hCRHF. 


Define a signature scheme as follows. 


Key generation: Compute s — Geng (1*); this specifies groups (G, ®), (G’, ®) 
and sets S, T. Choose x,y uniformly at random from S. Output sk := (x, y) 
and pk := (s, Hs(x), Hs(y)). 

Signing: The scheme is defined for messages m satisfying 0 < m < 2%. Given 
m, output the signature o := x @ my. 

Verification: Given a signature ø on the message m with respect to the public 


key pk = (s,a,b), output 1 iff o € T and H,(c) Ž 49mb. 


Theorem 4. If (Genz, H) is a (strong) e-hCRHF, then the above is a one-time 
signature scheme that is (fully) (4 = 2e) -n-leakage resilient. 


Proof. Correctness is easily verified. Let IT denote the scheme given above, and 


let A be a PPT adversary with ô = 6(k) © Pr[Succà e (k)]. We construct 


an adversary B breaking the security of (Geng, H) with probability at least 
6/2 —negl(k), implying that 6 must be negligible. 

B is given as input a key s (along with the randomness used to generate 
it). B chooses x,y € S, sets sk := (x,y), and gives the public key pk := 
(s, Hs(x), Hs(y)) to A. Algorithm 6 then answers the signing and leakage queries 
of A using the secret key (x, y) that it knows. Since this secret key is distributed 
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identically to the secret key of an honest signer, the simulation for A is perfect 
and A outputs a valid forgery (m’,o’) with probability ô. If this occurs, then B 
outputs (o', x ® m'y) as a candidate collision for Hs. 

Note that z @m’y € T. If o’ is a valid signature on m’, we have o’ € T and 


H,(o') = Hs(£) 8 m Hs(y) = Hs(x © m'y). 


It remains to show that o’ £4 x ẹ m'y with significant probability. 


Let c be the constant guaranteed to exist by condition Blof Definition H The 


length of the secret key is n E 2log|S] bits] The information A has about 


sk = (x,y) consists of: (1) the signature z © my it obtained; (2) the values 
H,(x),Hs(y) from the public key; and (3) the answers to the leakage queries 
asked by A. These total at most 


1 
log |T| + 2 log |G’| + (5 — 2e) 2log|S| < (1 + €) log |S| + 2elog|S| 


+ log |S| — 4e log |S] 
= 2log |S| — clog | S| 


bits of information about sk. The min-entropy of sk is 2log|S| bits, so by 
Lemma [] it follows that H,.(sk | A’s view) > c+ 1 except with probability 
at most 276198 |Sl+c+1 which is negligible. 

Assuming H.,(sk | A’s view) > c+ 1, we claim that for any m’ 4 m (with 
0 < m < 2*) the value  @ m'y has min-entropy at least 1; this follows from 
the fact that, for any fixed ô’, the two equations o = x @ my and ô’ = z BẸ m'y 
constrain (x, y) to a set of size at most 2° (by condition Blof Definition). Thus, 
o’ =x ® m'y with probability at most 1/2. Putting everything together, we see 


that B finds a collision in H, with probability at least (5—negl(k))- 4 as claimed. 


4.3 Constructing (Strong) Homomorphic CRHFs 


Homomorphic CRHFs can be constructed from a variety of standard assump- 
tions. Here, we describe constructions based on the discrete logarithm and the 
RSA assumptions; in the full version, we show a construction based on lattices. 
All except the RSA-based construction are strong -hCRHFs. 


An instantiation based on the discrete logarithm assumption. Let G” 
be a group of prime order p > 2" where the discrete logarithm problem is hard. 
Let £= [2], and set S = T = G = Z$. 

The key-generation algorithm Geny outputs random g1,...,ge € G as the 
key. Given s = (g1,..-, ge), define Hs(z1,..., £e) = Ti g;*. This function is 
clearly homomorphic, and collision resistance follows by standard arguments. 


An instantiation based on the RSA assumption. Fix ¢ = [2]. On security 
parameter k, algorithm Genj;(1*) chooses safe primes p = 2p’ +1 and q = 2q' +1 


6 We assume for simplicity that elements of S can be described using exactly log |S] 
bits; the proof can be modified suitably if this is not the case. 
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with p’,q’ > 2*, and sets N = pq. (The primes p and q are not used after 
key generation, but because they are in memory during key generation this 
construction is not strong.) Geng then chooses a random element u € Zi, as 
well as a prime e > 2+)". The key is s = (N, e, u). 

Let G = Zý x Z and G’ = Z*,. Define 


H,(r, x) = rê - u” mod N. 
Take S = QRy x {0,...,2} C G (where QRy denotes the set of quadratic 


residues modulo N) and T = Z% x {0,..., 24+) -*}, 
The homomorphic property of H, is easy to see. One can also verify that: 


1. log |S| = w(log k) and log |G"| < e- log |S] and log |T| < (1 + €) log |S]. 
2. T is efficiently recognizable, and {2 + my | x,y E€ S0 <m<2} CT. 
3. For any s, any m,m’ with 0 < m < m’ < 2*, and any øg, 0": 


{x,y € S | Hs(£ + my) = o A Hs(£ + m'y) = o'} 


<1. 


(This uses the fact that QRy ~ Zp X Zg has no elements other than the 
identity whose order is less than 2*.) 


Collision resistance follows via standard arguments (e.g., BJJ). 
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